It is often of interest to know if a simple Pooled Petersen estimator, i.e., complete pooling over rows and columns, is appropriate.
As noted in Schwarz and Taylor (1998), the Pooled Petersen is unbiased under many conditions, but the most common are:
We can examine the first of these conditions by examining the results of the stratified analysis and the results of a (logical) row pooling over all release strata.
This data were made available from the Canadian Department of Fisheries and Oceans and represent release and recaptured of female fish in the Lower Shuswap region.
test.data.csv <- textConnection("
86 , 54 , 39 , 219
76 , 35 , 45 , 168
24 , 53 , 73 , 190
1039 , 1148 , 2009 , 0")
test.data <- as.matrix(read.csv(test.data.csv, header=FALSE, strip.white=TRUE))
test.data
#> V1 V2 V3 V4
#> [1,] 86 54 39 219
#> [2,] 76 35 45 168
#> [3,] 24 53 73 190
#> [4,] 1039 1148 2009 0
We now fit several models
library(SPAS)
mod1 <- SPAS.fit.model(test.data,
model.id="No restrictions",
row.pool.in=1:3, col.pool.in=1:3)
#> Using nlminb to find conditional MLE
#> outer mgc: 1407.445
#> outer mgc: 4133.626
#> outer mgc: 1602.162
#> outer mgc: 237.747
#> outer mgc: 11.2797
#> outer mgc: 49.32575
#> outer mgc: 4.749331
#> outer mgc: 2.049442
#> outer mgc: 9.45052
#> outer mgc: 2.380802
#> outer mgc: 10.40597
#> outer mgc: 2.647875
#> outer mgc: 10.87054
#> outer mgc: 1.960908
#> outer mgc: 27.7445
#> outer mgc: 5.406239
#> outer mgc: 7.31579
#> outer mgc: 4.551982
#> outer mgc: 2.173492
#> outer mgc: 1.405963
#> outer mgc: 8.919649
#> outer mgc: 0.7700033
#> outer mgc: 3.907549
#> outer mgc: 1.077837
#> outer mgc: 1.705958
#> outer mgc: 0.9025297
#> outer mgc: 0.5799697
#> outer mgc: 0.2992383
#> outer mgc: 0.1649127
#> outer mgc: 0.08758061
#> outer mgc: 0.04651695
#> outer mgc: 0.02470727
#> outer mgc: 0.01312329
#> outer mgc: 0.006970476
#> outer mgc: 0.003702399
#> outer mgc: 0.001966548
#> outer mgc: 0.001044543
#> outer mgc: 0.0005548146
#> Convergence codes from nlminb 0 relative convergence (4)
#> Finding conditional estimate of N
SPAS.print.model(mod1)
#> Model Name: No restrictions
#> Date of Fit: 2019-12-01 21:22
#> Version of OPEN SPAS used : SPAS-R 2020-01-01
#>
#> Raw data
#> V1 V2 V3 V4
#> [1,] 86 54 39 219
#> [2,] 76 35 45 168
#> [3,] 24 53 73 190
#> [4,] 1039 1148 2009 0
#>
#> Row pooling setup : 1 2 3
#> Col pooling setup : 1 2 3
#> Physical pooling : TRUE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 10240 (SE 324 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> pool1 pool2 pool3 V4
#> pool1 86 54 39 219
#> pool2 76 35 45 168
#> pool3 24 53 73 190
#> 1039 1148 2009 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 123.8103
#> Condition number of XX' after logical pooling 123.8103
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 30329.53 ; np: 15 ; AICc: -60629.07
#>
#> Code/Message from optimization is: 0 relative convergence (4)
#>
#> Estimates
#> pool1 pool2 pool3 psi cap.prob exp factor Pop Est
#> pool1 86.0 54.0 39.0 219 1.000 0.0 398
#> pool2 76.5 33.0 46.5 168 0.127 6.9 2558
#> pool3 24.5 44.7 80.7 190 0.046 20.8 7413
#> est unmarked 1038.0 1158.0 2000.0 0 NA NA 10369
#>
#> SE of above estimates
#> pool1 pool2 pool3 psi cap.prob exp factor Pop Est
#> pool1 9.3 7.3 6.2 14.8 0.000 0.0 0
#> pool2 8.7 5.4 7.0 13.0 0.035 2.2 711
#> pool3 5.0 4.3 7.5 13.8 0.006 2.8 953
#> est unmarked NA NA NA 0.0 NA NA 467
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 2.581688
#> Chisquare gof df : 0
#> Chisquare gof p : NA
#3 Pooling over all rows using logical pooling
mod2 <- SPAS.fit.model(test.data,
model.id="Logical pooling to single row",
row.pool.in=c(1,1,1), col.pool.in=1:3, row.physical.pool=FALSE)
#> Using nlminb to find conditional MLE
#> outer mgc: 3908.402
#> outer mgc: 3442.019
#> outer mgc: 2339.248
#> outer mgc: 1225.938
#> outer mgc: 141.9137
#> outer mgc: 72.98649
#> outer mgc: 10.65131
#> outer mgc: 2.106192
#> outer mgc: 0.01326909
#> outer mgc: 2.850843e-06
#> Convergence codes from nlminb 0 relative convergence (4)
#> Finding conditional estimate of N
SPAS.print.model(mod2)
#> Model Name: Logical pooling to single row
#> Date of Fit: 2019-12-01 21:22
#> Version of OPEN SPAS used : SPAS-R 2020-01-01
#>
#> Raw data
#> V1 V2 V3 V4
#> [1,] 86 54 39 219
#> [2,] 76 35 45 168
#> [3,] 24 53 73 190
#> [4,] 1039 1148 2009 0
#>
#> Row pooling setup : 1 1 1
#> Col pooling setup : 1 2 3
#> Physical pooling : FALSE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 10240 (SE 324 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> pool1 pool2 pool3 V4
#> pool.1 86 54 39 219
#> pool.1 76 35 45 168
#> pool.1 24 53 73 190
#> 1039 1148 2009 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 123.8103
#> Condition number of XX' after logical pooling 1
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 30304.4 ; np: 13 ; AICc: -60582.8
#>
#> Code/Message from optimization is: 0 relative convergence (4)
#>
#> Estimates
#> pool1 pool2 pool3 psi cap.prob exp factor Pop Est
#> pool.1 58.7 50.8 55.7 219 0.104 8.7 3841
#> pool.1 51.9 32.9 64.3 168 0.104 8.7 3127
#> pool.1 16.4 49.9 104.3 190 0.104 8.7 3282
#> est unmarked 1098.0 1156.0 1942.0 0 NA NA 10250
#>
#> SE of above estimates
#> pool1 pool2 pool3 psi cap.prob exp factor Pop Est
#> pool.1 5.5 6.0 8.2 14.8 0.004 0.4 165
#> pool.1 5.3 5.1 8.7 13.0 0.004 0.4 134
#> pool.1 3.2 6.0 10.2 13.8 0.004 0.4 141
#> est unmarked NA NA NA 0.0 NA NA 325
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 53.85277
#> Chisquare gof df : 2
#> Chisquare gof p : 2.023111e-12
mod3 <- SPAS.fit.model(test.data,
model.id="Physical pooling to single row",
row.pool.in=c(1,1,1), col.pool.in=1:3)
#> Using nlminb to find conditional MLE
#> outer mgc: 4138.626
#> outer mgc: 1519.635
#> outer mgc: 267.9392
#> outer mgc: 25.4021
#> outer mgc: 0.7073742
#> outer mgc: 0.0009756013
#> outer mgc: 2.069271e-09
#> Convergence codes from nlminb 0 relative convergence (4)
#> Finding conditional estimate of N
SPAS.print.model(mod3)
#> Model Name: Physical pooling to single row
#> Date of Fit: 2019-12-01 21:22
#> Version of OPEN SPAS used : SPAS-R 2020-01-01
#>
#> Raw data
#> V1 V2 V3 V4
#> [1,] 86 54 39 219
#> [2,] 76 35 45 168
#> [3,] 24 53 73 190
#> [4,] 1039 1148 2009 0
#>
#> Row pooling setup : 1 1 1
#> Col pooling setup : 1 2 3
#> Physical pooling : TRUE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 10240 (SE 324 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> pool1 pool2 pool3 V4
#> 1 186 142 157 577
#> 1039 1148 2009 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 1
#> Condition number of XX' after logical pooling 1
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 31438.32 ; np: 5 ; AICc: -62866.64
#>
#> Code/Message from optimization is: 0 relative convergence (4)
#>
#> Estimates
#> pool1 pool2 pool3 psi cap.prob exp factor Pop Est
#> 1 126.9 133.7 224.4 577 0.104 8.7 10250
#> est unmarked 1098.0 1156.0 1942.0 0 NA NA 10250
#>
#> SE of above estimates
#> pool1 pool2 pool3 psi cap.prob exp factor Pop Est
#> 1 6.6 6.8 10.8 24 0.004 0.4 325
#> est unmarked NA NA NA 0 NA NA 325
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 53.85277
#> Chisquare gof df : 2
#> Chisquare gof p : 2.023111e-12
# do physical complete pooling
mod4 <- SPAS.fit.model(test.data,
model.id="Physical pooling all rows and last two colum ns",
row.pool.in=c(1,1,1), col.pool.in=c(1,1,3))
#> Using nlminb to find conditional MLE
#> outer mgc: 4133.875
#> outer mgc: 3617.132
#> outer mgc: 908.9971
#> outer mgc: 128.982
#> outer mgc: 5.742086
#> outer mgc: 0.02915585
#> outer mgc: 1.360541e-06
#> Convergence codes from nlminb 0 relative convergence (4)
#> Finding conditional estimate of N
SPAS.print.model(mod4)
#> Model Name: Physical pooling all rows and last two colum ns
#> Date of Fit: 2019-12-01 21:22
#> Version of OPEN SPAS used : SPAS-R 2020-01-01
#>
#> Raw data
#> V1 V2 V3 V4
#> [1,] 86 54 39 219
#> [2,] 76 35 45 168
#> [3,] 24 53 73 190
#> [4,] 1039 1148 2009 0
#>
#> Row pooling setup : 1 1 1
#> Col pooling setup : 1 1 3
#> Physical pooling : TRUE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 10240 (SE 324 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> pool1 pool3 V4
#> 1 328 157 577
#> 2187 2009 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 1
#> Condition number of XX' after logical pooling 1
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 33180.75 ; np: 4 ; AICc: -66353.49
#>
#> Code/Message from optimization is: 0 relative convergence (4)
#>
#> Estimates
#> pool1 pool3 psi cap.prob exp factor Pop Est
#> 1 260.6 224.4 577 0.104 8.7 10250
#> est unmarked 2254.0 1942.0 0 NA NA 10250
#>
#> SE of above estimates
#> pool1 pool3 psi cap.prob exp factor Pop Est
#> 1 12.3 10.8 24 0.004 0.4 325
#> est unmarked NA NA 0 NA NA 325
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 42.0552
#> Chisquare gof df : 1
#> Chisquare gof p : 8.873292e-11
# do physical complete pooling
mod5 <- SPAS.fit.model(test.data,
model.id="Physical complete pooling",
row.pool.in=c(1,1,1), col.pool.in=c(1,1,1))
#> Using nlminb to find conditional MLE
#> outer mgc: 4139.929
#> outer mgc: 12199
#> outer mgc: 3556.981
#> outer mgc: 739.2359
#> outer mgc: 92.68885
#> outer mgc: 5.542151
#> outer mgc: 0.03100836
#> outer mgc: 9.911684e-07
#> Convergence codes from nlminb 0 relative convergence (4)
#> Finding conditional estimate of N
SPAS.print.model(mod5)
#> Model Name: Physical complete pooling
#> Date of Fit: 2019-12-01 21:22
#> Version of OPEN SPAS used : SPAS-R 2020-01-01
#>
#> Raw data
#> V1 V2 V3 V4
#> [1,] 86 54 39 219
#> [2,] 76 35 45 168
#> [3,] 24 53 73 190
#> [4,] 1039 1148 2009 0
#>
#> Row pooling setup : 1 1 1
#> Col pooling setup : 1 1 1
#> Physical pooling : TRUE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 10240 (SE 324 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> 1 V4
#> 1 485 577
#> 4196 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 1
#> Condition number of XX' after logical pooling 1
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 36412.34 ; np: 3 ; AICc: -72818.69
#>
#> Code/Message from optimization is: 0 relative convergence (4)
#>
#> Estimates
#> 1 psi cap.prob exp factor Pop Est
#> 1 485 577 0.104 8.7 10250
#> est unmarked 4196 0 NA NA 10250
#>
#> SE of above estimates
#> 1 psi cap.prob exp factor Pop Est
#> 1 22 24 0.004 0.4 325
#> est unmarked NA 0 NA NA 325
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 2.025601e-15
#> Chisquare gof df : 0
#> Chisquare gof p : NA
model.list <- mget( ls()[grepl("^mod.$",ls())])
names(model.list)
#> [1] "mod1" "mod2" "mod3" "mod4" "mod5"
report <- plyr::ldply(model.list, function(x){
#browser()
data.frame(#version=x$version,
date = as.Date(x$date),
model.id = x$model.info$model.id,
s.a.pool =-1+nrow(x$fit.setup$pooldata),
t.p.pool =-1+ncol(x$fit.setup$pooldata),
logL.cond = x$model.info$logL.cond,
np = x$model.info$np,
AICc = x$model.info$AICc,
gof.chisq = round(x$gof$chisq,1),
gof.df = x$gof$chisq.df,
gof.p = round(x$gof$chisq.p,3),
Nhat = round(x$est$real$N),
Nhat.se = round(x$se $real$N))
})
report
#> .id date model.id s.a.pool
#> 1 mod1 2019-12-02 No restrictions 3
#> 2 mod2 2019-12-02 Logical pooling to single row 3
#> 3 mod3 2019-12-02 Physical pooling to single row 1
#> 4 mod4 2019-12-02 Physical pooling all rows and last two colum ns 1
#> 5 mod5 2019-12-02 Physical complete pooling 1
#> t.p.pool logL.cond np AICc gof.chisq gof.df gof.p Nhat Nhat.se
#> 1 3 30329.53 15 -60629.07 2.6 0 NA 10369 467
#> 2 3 30304.40 13 -60582.80 53.9 2 0 10250 325
#> 3 3 31438.32 5 -62866.64 53.9 2 0 10250 325
#> 4 2 33180.75 4 -66353.49 42.1 1 0 10250 325
#> 5 1 36412.34 3 -72818.69 0.0 0 NA 10250 325
The AIC should be compared ONLY for the first two models because they are based on the same set of data. You cannot compare models that differ in the physical pooling
In this case, there is good evidence that the Pooled Petersen is too coarse because the goodness of fit statistic for the second model is very large (with a corresponding small goodness-of-fit p-value). Similarly, the AIC indicates that the model is 3x3 stratification (first model) is preferable to the model with complete row pooling (second model).
Notice that the estimates of the population size are identical under logical or physical row pooling (models 2 and 3). And how you pool columns (models 3, 4, 5) but assuming that the number of rows (after logical or physical pooling as long the number of rows is not larger than the number of columns) does not affect the population size estimate (or standard error).
Darroch, J. N. (1961). The two-sample capture-recapture census when tagging and sampling are stratified. Biometrika, 48, 241–260. https://www.jstor.org/stable/2332748
Plante, N., L.-P Rivest, and G. Tremblay. (1988). Stratified Capture-Recapture Estimation of the Size of a Closed Population. Biometrics 54, 47-60. https://www.jstor.org/stable/2533994
Schwarz, C. J., & Taylor, C. G. (1998). The use of the stratified-Petersen estimator in fisheries management with an illustration of estimating the number of pink salmon (Oncorhynchus gorbuscha) that return to spawn in the Fraser River. Canadian Journal of Fisheries and Aquatic Sciences, 55, 281–296. https://doi.org/10.1139/f97-238