This analyzes a dataset from the Conne River, Newfoundland collected in 1991. Smolts are tagged and released at one location in the river. A fence at a second station also captures fish. Data are stratified on a weekly level.
The Conne River 1992 vignette has larger sample sizes.
A stratified-Petersen estimator (Darroch, 1961; Plante et al. 1996) is found for the original data and some pooling of the rows/columns.
library(SPAS)
#> Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
#> TMB was built with Matrix version 1.2.17
#> Current Matrix version is 1.2.18
#> Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package
#> ***** SPAS: Stratified Petersen Analysis System - Version 2020.1.1 2020-01-01) *****
#>
#> Help available with help(package='SPAS')
#> Several vignettes are available. See browseVignettes(package="SPAS")
#>
#> ***** CAUTION ***** Output object has significantly changed since last release ****
#> Use SPAS.print.legacy() to display previously created objects
This will load the SPAS fitting functions and any associated packages needed by SPAS.
The data should be stored as an \(s+1\) by \(t+1\) (before pooling) matrix where the upper \(s \times t\) matrix consists of the The \(s \times t\) upper left matrix is the number of animals released in row stratum \(i\) and recovered in column stratum \(j\). Row \(s+1\) contains the total number of UNMARKED animals recovered in column stratum \(j\). Column \(t+1\) contains the number of animals marked in each row stratum but not recovered in any column stratum. The \(rawdata[s+1, t+1]\) is not used and can be set to 0 or NA. The sum of the entries in each of the first \(s\) rows is then the number of animals marked in each row stratum. The sum of the entries in each of the first \(t\) columns is then the number of animals captured (marked and unmarked) in each column stratum. The row/column names of the matrix may be set to identify the entries in the output.
Here is the raw data for the Conne River. Notice that very small number of releases and recoveries in week 6.
conne.data.csv <- textConnection("
9 , 21 , 0 , 0 , 0 , 0 , 171
0 , 101 , 22 , 1 , 0 , 0 , 763
0 , 0 , 128 , 49 , 0 , 0 , 934
0 , 0 , 0 , 48 , 12 , 0 , 434
0 , 0 , 0 , 0 , 7 , 0 , 49
0 , 0 , 0 , 0 , 0 , 0 , 4
351, 2736 , 3847 , 1818 , 543 , 191 , 0")
conne.data <- as.matrix(read.csv(conne.data.csv, header=FALSE))
A total of 201 fish were tagged and released in week 1. Of these fish, 9 were recovered down stream in column stratum 1; 21 were recovered in column stratum 2; and 171 were never seen again. A total of 351 UNMARKED fish were recovered in column stratum 1.
Notice the VERY small sample size in row 6 with only 4 fish released and non-recaptured. We will likely have to pool some rows in this problem because there is virtually no information provided by row 6 which makes the problem ill conditioned and may not converge properly.
You can add row and column names to the matrix which will used when the data are printed.
The Stratified-Petersen model can now be fit to the above data object.
mod1 <- SPAS::SPAS.fit.model(conne.data,
model.id="No restrictions",
row.pool.in=1:6, col.pool.in=1:6)
The row.pool.in
and col.pool.in
inform the function on which rows or columns to physically pool before the analysis proceeds. Both parameters use a vector of codes (length \(s\) for row pooing and length \(t\) for column pooling) where rows/columns are pooled that share the same value.
For example. row.pool.in=c(1,2,3,4,5,6)
would imply that no rows are pooled, while row.pool.in=c('a','a','a','b','b','b')
would imply that the first three rows and last three rows are pooled. The entries in the vector can be numeric or character; however, using character entries implies that the final pooled matrix is displayed in the order of the character entries. I find that using entries such as '123'
to represent pooling rows 1, 2, and 3 to be easiest to use.
The SPAS system only fits models were the number of rows after pooling is less than or equal to the number of columns after pooling.
The likelihood is rather flat, so don’t be too concerned about messages from the optimization on failing to converge. As long as the likelihood values seem to be have stabilized and estimates are sensible (i.e. capture rates not 0 or 1; estimates of movement sensible results), the estimates should be fine.
The results of the model fit is a LARGE list. But the function SPAS.print.model
produces a nice report
SPAS.print.model(mod1)
#> Model Name: No restrictions
#> Date of Fit: 2019-12-01 21:22
#> Version of OPEN SPAS used : SPAS-R 2020-01-01
#>
#> Raw data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 9 21 0 0 0 0 171
#> [2,] 0 101 22 1 0 0 763
#> [3,] 0 0 128 49 0 0 934
#> [4,] 0 0 0 48 12 0 434
#> [5,] 0 0 0 0 7 0 49
#> [6,] 0 0 0 0 0 0 4
#> [7,] 351 2736 3847 1818 543 191 0
#>
#> Row pooling setup : 1 2 3 4 5 6
#> Col pooling setup : 1 2 3 4 5 6
#> Physical pooling : TRUE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 68228 (SE 3090 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> pool1 pool2 pool3 pool4 pool5 pool6 V7
#> pool1 9 21 0 0 0 0 171
#> pool2 0 101 22 1 0 0 763
#> pool3 0 0 128 49 0 0 934
#> pool4 0 0 0 48 12 0 434
#> pool5 0 0 0 0 7 0 49
#> pool6 0 0 0 0 0 0 4
#> 351 2736 3847 1818 543 191 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is Inf
#> Condition number of XX' after logical pooling Inf
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 78115.99 ; np: 48 ; AICc: -156136
#>
#> Code/Message from optimization is: 1 singular convergence (7)
#>
#> Estimates
#> pool1 pool2 pool3 pool4 pool5 pool6 psi cap.prob exp factor
#> pool1 8.9 21.1 0.0 0.0 0.0 0.0 171 0.025 39.3
#> pool2 0.0 101.2 21.8 1.0 0.0 0.0 763 0.050 18.9
#> pool3 0.0 0.0 126.5 50.5 0.0 0.0 934 0.035 27.2
#> pool4 0.0 0.0 0.0 48.5 11.5 0.0 434 0.103 8.7
#> pool5 0.0 0.0 0.0 0.0 4.9 2.1 49 0.011 90.5
#> pool6 0.0 0.0 0.0 0.0 0.0 0.0 4 0.269 2.7
#> est unmarked 351.0 2736.0 3849.0 1816.0 546.0 189.0 0 NA NA
#> Pop Est
#> pool1 8103
#> pool2 17611
#> pool3 31308
#> pool4 4808
#> pool5 5126
#> pool6 15
#> est unmarked 66971
#>
#> SE of above estimates
#> pool1 pool2 pool3 pool4 pool5 pool6 psi cap.prob exp factor
#> pool1 3 4.6 0.0 0.0 0.0 0.0 13.1 0.008 13.4
#> pool2 0 10.0 4.6 1.0 0.0 0.0 27.6 0.010 3.8
#> pool3 0 0.0 11.2 7.1 0.0 0.0 30.6 0.003 2.6
#> pool4 0 0.0 0.0 6.9 3.3 0.0 20.8 0.053 5.0
#> pool5 0 0.0 0.0 0.0 1.9 0.8 7.0 0.004 35.5
#> pool6 0 0.0 0.0 0.0 0.0 0.0 2.0 0.000 0.0
#> est unmarked NA NA NA NA NA NA 0.0 NA NA
#> Pop Est
#> pool1 2688
#> pool2 3405
#> pool3 2939
#> pool4 2482
#> pool5 1991
#> pool6 0
#> est unmarked 3496
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 3.102442
#> Chisquare gof df : 0
#> Chisquare gof p : NA
The original data, the data after pooling, estimates and their standard errors are shown. Here the stratified-Petersen estimate of the total number of smolts passing the at the first sampling station is 66,971 with a standard error of 3,496.
Each entry in the output is available in fitted model object. You will need to explore the structure
cat("Names of objects at highest level\n")
#> Names of objects at highest level
names(mod1)
#> [1] "version" "date" "input" "kappa"
#> [5] "kappa.after.lp" "est" "vcv" "se"
#> [9] "fit.setup" "conditional" "model.info" "gof"
cat("\n\nNames of estimates (both beta and real)\n")
#>
#>
#> Names of estimates (both beta and real)
names(mod1$est)
#> [1] "N.Chapman" "beta" "real"
cat("\n\nNames of real estimates\n")
#>
#>
#> Names of real estimates
names(mod1$est$real)
#> [1] "psi" "theta" "cap" "all.expanded" "N"
#> [6] "N.stratum" "exp.factor"
The N
entries refer to the population size; the N.stratum
entries refer to the individual stratum population sizes; the cap
entries refer to the estimated probability of capture in each row stratum; the exp.factor
entries refer to (1-cap)/cap
, or the expansion factor for each row; the psi
entries refer to the number of animals tagged but never seen again (the right most column in the input data); and the theta
entries refer to the expected number of animals that were tagged in row stratum \(i\) and recovered in column stratum \(j\) (after pooling).
As noted by Darroch (1961), the stratified-Petersen will fail if the matrix of movements is close to singular. This often happens if two rows are proportional to each other. In this case, there is no unique MLE for the probability of capture in the last two rows, and they should be pooled. A detailed discussion of pooling is found in Schwarz and Taylor (1998).
Let us now pool the first two rows and the last two rows, but leave rows 3 and 4 alone. Similar, let us pool the last two columns.
The code is
mod2 <- SPAS.fit.model(conne.data, model.id="Pooling some rows",
row.pool.in=c("12","12","3","4","56","56"),
col.pool.in=c(1,2,3,4,56,56))
Notice how we specify the pooling for rows and columns and the choice of entries for the two corresponding vectors. I used character entries for the row pooling to ensure that the pooled matrix is sorted properly (see below); the numeric entries for the columns is ok as is.
The results of the model fit is
SPAS.print.model(mod2)
#> Model Name: Pooling some rows
#> Date of Fit: 2019-12-01 21:22
#> Version of OPEN SPAS used : SPAS-R 2020-01-01
#>
#> Raw data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 9 21 0 0 0 0 171
#> [2,] 0 101 22 1 0 0 763
#> [3,] 0 0 128 49 0 0 934
#> [4,] 0 0 0 48 12 0 434
#> [5,] 0 0 0 0 7 0 49
#> [6,] 0 0 0 0 0 0 4
#> [7,] 351 2736 3847 1818 543 191 0
#>
#> Row pooling setup : 12 12 3 4 56 56
#> Col pooling setup : 1 2 3 4 56 56
#> Physical pooling : TRUE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 68228 (SE 3090 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> pool1 pool2 pool3 pool4 pool56 V7
#> pool12 9 122 22 1 0 934
#> pool3 0 0 128 49 0 934
#> pool4 0 0 0 48 12 434
#> pool56 0 0 0 0 7 53
#> 351 2736 3847 1818 734 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 541.5353
#> Condition number of XX' after logical pooling 541.5353
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 79052.91 ; np: 28 ; AICc: -158049.8
#>
#> Code/Message from optimization is: 0 relative convergence (4)
#>
#> Estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> pool12 11.7 119.3 21.9 1.0 0.0 934 0.042 23.0 26059
#> pool3 0.0 0.0 127.6 49.4 0.0 934 0.037 26.2 30230
#> pool4 0.0 0.0 0.0 48.2 11.8 434 0.088 10.4 5615
#> pool56 0.8 0.0 0.0 0.0 6.2 53 0.010 98.6 5978
#> est unmarked 347.0 2739.0 3847.0 1817.0 735.0 0 NA NA 67882
#>
#> SE of above estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> pool12 4.2 11.2 4.7 1.0 0.0 30.6 0.004 2.2 2402
#> pool3 0.0 0.0 11.3 7.0 0.0 30.6 0.003 2.6 2837
#> pool4 0.0 0.0 0.0 6.9 3.4 20.8 0.039 5.0 2458
#> pool56 1.0 0.0 0.0 0.0 2.5 7.3 0.004 42.1 2524
#> est unmarked NA NA NA NA NA 0.0 NA NA 3593
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 1.637457
#> Chisquare gof df : 1
#> Chisquare gof p : 0.2006747
Here the stratified-Petersen estimate of the total number of smolts passing the at the first sampling station is 67,882 with a standard error of 3,593. which is a slight reduction from the unpooled estimates.
You can pool to a single row (and multiple columns) or a single row and single column (which is equivalent to the pooled Petersen estimator). The code and output follow:
mod3 <- SPAS.fit.model(conne.data, model.id="A single row",
row.pool.in=rep(1, nrow(conne.data)-1),
col.pool.in=c(1,2,3,4,56,56))
SPAS.print.model(mod3)
#> Model Name: A single row
#> Date of Fit: 2019-12-01 21:22
#> Version of OPEN SPAS used : SPAS-R 2020-01-01
#>
#> Raw data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 9 21 0 0 0 0 171
#> [2,] 0 101 22 1 0 0 763
#> [3,] 0 0 128 49 0 0 934
#> [4,] 0 0 0 48 12 0 434
#> [5,] 0 0 0 0 7 0 49
#> [6,] 0 0 0 0 0 0 4
#> [7,] 351 2736 3847 1818 543 191 0
#>
#> Row pooling setup : 1 1 1 1 1 1
#> Col pooling setup : 1 2 3 4 56 56
#> Physical pooling : TRUE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 68228 (SE 3090 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> pool1 pool2 pool3 pool4 pool56 V7
#> 1 9 122 150 98 19 2355
#> 351 2736 3847 1818 734 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 1
#> Condition number of XX' after logical pooling 1
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 81857.54 ; np: 7 ; AICc: -163701.1
#>
#> Code/Message from optimization is: 0 both X-convergence and relative convergence (5)
#>
#> Estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> 1 14.5 115.1 160.9 77.2 30.3 2355 0.04 23.8 68368
#> est unmarked 346.0 2743.0 3836.0 1839.0 723.0 0 NA NA 68368
#>
#> SE of above estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> 1 1 6 8.3 4.2 1.9 48.5 0.002 1.2 3105
#> est unmarked NA NA NA NA NA 0.0 NA NA 3105
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 13.65473
#> Chisquare gof df : 4
#> Chisquare gof p : 0.008482579
mod4 <- SPAS.fit.model(conne.data, model.id="Pooled Peteren",
row.pool.in=rep(1, nrow(conne.data)-1),
col.pool.in=rep(1, ncol(conne.data)-1))
SPAS.print.model(mod4)
#> Model Name: Pooled Peteren
#> Date of Fit: 2019-12-01 21:22
#> Version of OPEN SPAS used : SPAS-R 2020-01-01
#>
#> Raw data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 9 21 0 0 0 0 171
#> [2,] 0 101 22 1 0 0 763
#> [3,] 0 0 128 49 0 0 934
#> [4,] 0 0 0 48 12 0 434
#> [5,] 0 0 0 0 7 0 49
#> [6,] 0 0 0 0 0 0 4
#> [7,] 351 2736 3847 1818 543 191 0
#>
#> Row pooling setup : 1 1 1 1 1 1
#> Col pooling setup : 1 1 1 1 1 1
#> Physical pooling : TRUE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 68228 (SE 3090 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> 1 V7
#> 1 398 2355
#> 9486 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 1
#> Condition number of XX' after logical pooling 1
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 95297.26 ; np: 3 ; AICc: -190588.5
#>
#> Code/Message from optimization is: 0 relative convergence (4)
#>
#> Estimates
#> 1 psi cap.prob exp factor Pop Est
#> 1 398 2355 0.04 23.8 68368
#> est unmarked 9486 0 NA NA 68368
#>
#> SE of above estimates
#> 1 psi cap.prob exp factor Pop Est
#> 1 19.9 48.5 0.002 1.2 3105
#> est unmarked NA 0.0 NA NA 3105
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 8.319195e-16
#> Chisquare gof df : 0
#> Chisquare gof p : NA
Both models have an estimated abundance of 68,368 with a standard error of 3,105. which is a slight reduction from the unpooled estimates.
Unfortunately, because pooling actually changes the data (internally), it is not possible to do likelihood ratio testing or to use AICc to compare different poolings. Methods to compare different poolings is under active investigation.
In logical pooling, we do not physically pool rows or columns. As noted elsewhere, it is not possible to implement logical pooling for columns, so only row will be discussed here. In logical pooling, the capture probabilities in the stratum of release are forced equal. Logical pooling is specified using the same keywords for physical pooling, but the additional argument row.physical.pooling=FALSE should be set.
Let us now logically pool the first two rows and the last two rows, but leave rows 3 and 4 alone. Similar, let us physically pool the last two columns.
The code is
mod5 <- SPAS.fit.model(conne.data, model.id="Pooling some rows",
row.pool.in=c("12","12","3","4","56","56"),
row.physical.pool=FALSE,
col.pool.in=c(1,2,3,4,56,56))
This gives:
SPAS.print.model(mod5)
#> Model Name: Pooling some rows
#> Date of Fit: 2019-12-01 21:22
#> Version of OPEN SPAS used : SPAS-R 2020-01-01
#>
#> Raw data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 9 21 0 0 0 0 171
#> [2,] 0 101 22 1 0 0 763
#> [3,] 0 0 128 49 0 0 934
#> [4,] 0 0 0 48 12 0 434
#> [5,] 0 0 0 0 7 0 49
#> [6,] 0 0 0 0 0 0 4
#> [7,] 351 2736 3847 1818 543 191 0
#>
#> Row pooling setup : 12 12 3 4 56 56
#> Col pooling setup : 1 2 3 4 56 56
#> Physical pooling : FALSE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 68228 (SE 3090 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> pool1 pool2 pool3 pool4 pool56 V7
#> pool.12 9 21 0 0 0 171
#> pool.12 0 101 22 1 0 763
#> pool.3 0 0 128 49 0 934
#> pool.4 0 0 0 48 12 434
#> pool.56 0 0 0 0 7 49
#> pool.56 0 0 0 0 0 4
#> 351 2736 3847 1818 734 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is Inf
#> Condition number of XX' after logical pooling 541.5353
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 78538.08 ; np: 40 ; AICc: -156996.2
#>
#> Code/Message from optimization is: 1 singular convergence (7)
#>
#> Estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> pool.12 11.7 20.5 0.0 0.0 0.0 171 0.042 23.0 4814
#> pool.12 0.0 98.8 21.9 1.0 0.0 763 0.042 23.0 21245
#> pool.3 0.0 0.0 127.6 49.4 0.0 934 0.037 26.2 30230
#> pool.4 0.0 0.0 0.0 48.2 11.8 434 0.088 10.4 5615
#> pool.56 0.4 0.0 0.0 0.0 6.2 49 0.010 98.6 5580
#> pool.56 0.4 0.0 0.0 0.0 0.0 4 0.010 98.6 399
#> est unmarked 347.0 2739.0 3847.0 1817.0 735.0 0 NA NA 67882
#>
#> SE of above estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> pool.12 4.2 4.5 0.0 0.0 0.0 13.1 0.004 2.2 444
#> pool.12 0.0 10.1 4.7 1.0 0.0 27.6 0.004 2.2 1958
#> pool.3 0.0 0.0 11.3 7.0 0.0 30.6 0.003 2.6 2837
#> pool.4 0.0 0.0 0.0 6.9 3.4 20.8 0.039 5.0 2458
#> pool.56 0.5 0.0 0.0 0.0 2.5 7.0 0.004 42.1 2356
#> pool.56 0.5 0.0 0.0 0.0 0.0 2.0 0.004 42.1 168
#> est unmarked NA NA NA NA NA 0.0 NA NA 3593
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 1.637457
#> Chisquare gof df : 1
#> Chisquare gof p : 0.2006747
Darroch, J. N. (1961). The two-sample capture-recapture census when tagging and sampling are stratified. Biometrika, 48, 241–260. https://www.jstor.org/stable/2332748
Plante, N., L.-P Rivest, and G. Tremblay. (1988). Stratified Capture-Recapture Estimation of the Size of a Closed Population. Biometrics 54, 47-60. https://www.jstor.org/stable/2533994
Schwarz, C. J., & Taylor, C. G. (1998). The use of the stratified-Petersen estimator in fisheries management with an illustration of estimating the number of pink salmon (Oncorhynchus gorbuscha) that return to spawn in the Fraser River. Canadian Journal of Fisheries and Aquatic Sciences, 55, 281–296. https://doi.org/10.1139/f97-238