Basic unit-level models

The basic unit-level model (Battese, Harter, and Fuller 1988; Rao and Molina 2015) is given by \[ y_j = \beta' x_j + v_{i[j]} + \epsilon_j\,,\\ v_i \stackrel{\mathrm{iid}}{\sim} {\cal N}(0, \sigma_v^2) \qquad \epsilon_j \stackrel{\mathrm{iid}}{\sim} {\cal N}(0, \sigma^2) \] where \(j\) runs from 1 to \(n\), the number of unit-level observations, \(\beta\) is a vector of regression coefficients for given covariates \(x_j\), and \(v_i\) are random area intercepts.

The apipop data frame contains the complete population whereas apisrs is a simple random sample from it. The variable cname is the county name, and we will be interested in estimation at the county level. Not all counties in the population are sampled. In order to be able to make predictions for out-of-sample areas we make sure that the levels of the sample’s cname variable match those of its population counterpart.

We want to estimate the area population means \[ \theta_i = \frac{1}{N_i}\sum_{j \in U_i} y_j\,, \] where \(U_i\) is the set of units in area \(i\) of size \(N_i\). The MCMC output in variable sim can be used to obtain draws from the posterior distribution for \(\theta_i\). The \(r\)th draw can be expressed as \[ \theta_{i;r} = \frac{1}{N_i} \left(n_i \bar{y}_i + \beta_r'(t_{x;i} - n_i \bar{x}_i) + (N_i - n_i)v_{i;r} + \sum_{j \in U_i\setminus s_i} \epsilon_{j;r} \right)\,, \] where \(\bar{y}_i\) is the sample mean of \(y\) in area \(i\) and \(t_{x;i}\) is a vector of population totals for area \(i\).

Binomial Unit-Level Model

A model with binomial likelihood can also be fit. We now model the target variable sch.wide, a binary variable indicating whether a school-wide growth target has been met. We use the same mean model structure as above for the linear model, but now using a logistic link function, \[ y_j \stackrel{\mathrm{iid}}{\sim} {\cal Be}(p_j)\,,\\ \mathrm{logit}(p_j) = \beta' x_j + v_{i[j]}\,,\\ v_i \stackrel{\mathrm{iid}}{\sim} {\cal N}(0, \sigma_v^2) \]

apisrs$target.met <- as.numeric(apisrs$sch.wide == "Yes")
sampler <- create_sampler(update(mod, target.met ~ .), family="binomial", data=apisrs)
sim <- MCMCsim(sampler, store.all=TRUE, verbose=FALSE)
summary(sim)

## llh_ :
##       Mean   SD t-value   MCSE q0.05  q0.5 q0.95 n_eff R_hat
## llh_ -83.3 2.49   -33.4 0.0607 -87.5 -83.2 -79.4  1685     1
## 
## beta :
##                  Mean     SD t-value     MCSE   q0.05      q0.5    q0.95 n_eff R_hat
## (Intercept)  4.563445 1.4934  3.0558 0.058914  2.2238  4.480713  7.14145   643  1.00
## ell         -0.035823 0.0150 -2.3838 0.000420 -0.0612 -0.035212 -0.01215  1280  1.00
## meals        0.000808 0.0141  0.0572 0.000454 -0.0223  0.000659  0.02462   971  1.00
## stypeH      -2.821852 0.6210 -4.5441 0.019304 -3.8534 -2.800611 -1.85445  1035  1.00
## stypeM      -1.651438 0.5873 -2.8118 0.016695 -2.5977 -1.644618 -0.71640  1238  1.00
## hsg         -0.025548 0.0223 -1.1470 0.000657 -0.0626 -0.025645  0.01055  1148  1.00
## col.grad    -0.037802 0.0269 -1.4045 0.000868 -0.0816 -0.037805  0.00687   961  1.01
## grad.sch     0.014016 0.0316  0.4434 0.001113 -0.0347  0.012320  0.06888   807  1.00
## 
## v_sigma :
##          Mean    SD t-value   MCSE  q0.05 q0.5 q0.95 n_eff R_hat
## v_sigma 0.375 0.286    1.31 0.0123 0.0297 0.32 0.917   535  1.01
## 
## v :
##                   Mean    SD  t-value    MCSE  q0.05     q0.5 q0.95 n_eff R_hat
## Alameda      -0.190148 0.433 -0.43866 0.01363 -1.000 -0.07204 0.338  1012 1.003
## Amador        0.004611 0.489  0.00943 0.00915 -0.771  0.00267 0.762  2856 1.000
## Butte        -0.000808 0.478 -0.00169 0.00872 -0.759 -0.00202 0.785  3000 0.999
## Calaveras     0.035399 0.465  0.07609 0.00914 -0.675  0.00547 0.845  2588 0.999
## Colusa       -0.010062 0.462 -0.02176 0.00846 -0.797 -0.00139 0.725  2991 1.000
## Contra Costa  0.005358 0.404  0.01327 0.00940 -0.666  0.00159 0.674  1846 1.002
## Del Norte    -0.007185 0.479 -0.01501 0.00887 -0.792 -0.00109 0.734  2910 1.000
## El Dorado     0.004596 0.451  0.01019 0.00838 -0.729  0.00206 0.721  2896 1.000
## Fresno       -0.042824 0.383 -0.11190 0.00892 -0.694 -0.01627 0.544  1841 1.002
## Glenn        -0.014269 0.476 -0.02995 0.00871 -0.804 -0.00148 0.730  2994 1.000
## ... 47 elements suppressed ...

To predict the population fractions of schools that meet the growth target by county,

samplesums <- tapply(apisrs$target.met, apisrs$cname, sum)
samplesums[is.na(samplesums)] <- 0  # substitute 0 for out-of-sample areas
res <- predict(sim, newdata=apipop, labels=names(N),
         fun=function(x, p) (samplesums + tapply(x[-m], apipop$cname[-m], sum ))/N,
         show.progress=FALSE)
(summ <- summary(res))

##                  Mean     SD t-value     MCSE q0.05  q0.5 q0.95 n_eff R_hat
## Alameda         0.784 0.0615   12.74 0.001650 0.670 0.792 0.864  1390 1.002
## Amador          0.837 0.1171    7.14 0.002182 0.600 0.800 1.000  2880 1.000
## Butte           0.835 0.0755   11.06 0.001548 0.708 0.833 0.938  2377 1.000
## Calaveras       0.875 0.0953    9.18 0.001884 0.700 0.900 1.000  2558 1.000
## Colusa          0.645 0.1577    4.09 0.002951 0.333 0.667 0.889  2856 1.001
## Contra Costa    0.817 0.0570   14.32 0.001349 0.721 0.821 0.899  1786 1.001
## Del Norte       0.881 0.1079    8.16 0.002191 0.750 0.875 1.000  2424 1.001
## El Dorado       0.850 0.0758   11.23 0.001521 0.725 0.850 0.950  2482 1.002
## Fresno          0.781 0.0618   12.65 0.001463 0.667 0.790 0.871  1783 1.001
## Glenn           0.790 0.1415    5.58 0.002717 0.556 0.778 1.000  2713 1.002
## Humboldt        0.850 0.0746   11.39 0.001533 0.725 0.850 0.950  2369 1.000
## Imperial        0.699 0.1059    6.60 0.002309 0.500 0.700 0.850  2103 1.002
## Inyo            0.777 0.1584    4.91 0.003087 0.571 0.857 1.000  2635 1.000
## Kern            0.804 0.0523   15.36 0.001232 0.706 0.811 0.878  1805 1.000
## Kings           0.850 0.0791   10.74 0.001654 0.720 0.840 0.960  2289 1.001
## Lake            0.825 0.0957    8.62 0.001837 0.636 0.818 0.955  2717 1.002
## Lassen          0.834 0.1100    7.58 0.002184 0.636 0.818 1.000  2534 1.000
## Los Angeles     0.799 0.0431   18.56 0.001040 0.728 0.799 0.869  1714 1.000
## Madera          0.820 0.0753   10.89 0.001539 0.677 0.839 0.935  2391 1.001
## Marin           0.837 0.0707   11.84 0.001705 0.700 0.840 0.940  1721 1.002
## Mariposa        0.860 0.1517    5.67 0.003097 0.600 0.800 1.000  2400 1.002
## Mendocino       0.815 0.0905    9.01 0.001700 0.640 0.840 0.960  2834 0.999
## Merced          0.787 0.0826    9.53 0.001727 0.635 0.794 0.905  2289 1.001
## Modoc           0.832 0.1578    5.27 0.002998 0.600 0.800 1.000  2769 0.999
## Mono            0.748 0.2435    3.07 0.004572 0.333 0.667 1.000  2837 1.000
## Monterey        0.803 0.0682   11.78 0.001688 0.687 0.807 0.916  1633 1.001
## Napa            0.845 0.0790   10.70 0.001538 0.704 0.852 0.963  2635 1.000
## Nevada          0.866 0.0961    9.01 0.001865 0.714 0.857 1.000  2655 0.999
## Orange          0.774 0.0611   12.68 0.001364 0.672 0.775 0.873  2004 1.000
## Placer          0.813 0.0761   10.68 0.001723 0.682 0.818 0.909  1950 0.999
## Plumas          0.738 0.1558    4.74 0.002921 0.444 0.778 1.000  2844 0.999
## Riverside       0.824 0.0527   15.64 0.001133 0.729 0.831 0.895  2163 1.000
## Sacramento      0.847 0.0493   17.20 0.001286 0.764 0.847 0.924  1467 1.001
## San Benito      0.816 0.1262    6.47 0.002513 0.545 0.818 1.000  2522 1.000
## San Bernardino  0.862 0.0422   20.43 0.001159 0.793 0.862 0.934  1326 1.005
## San Diego       0.848 0.0427   19.85 0.000988 0.775 0.850 0.913  1870 1.002
## San Francisco   0.761 0.0740   10.28 0.001524 0.630 0.770 0.870  2359 0.999
## San Joaquin     0.827 0.0576   14.36 0.001311 0.726 0.831 0.911  1932 1.001
## San Luis Obispo 0.854 0.0731   11.69 0.001469 0.725 0.850 0.950  2474 0.999
## San Mateo       0.796 0.0746   10.67 0.001915 0.664 0.804 0.888  1519 1.001
## Santa Barbara   0.828 0.0669   12.39 0.001645 0.716 0.840 0.926  1652 1.001
## Santa Clara     0.782 0.0729   10.73 0.002138 0.645 0.796 0.871  1162 1.000
## Santa Cruz      0.787 0.0753   10.45 0.001667 0.653 0.796 0.898  2044 1.000
## Shasta          0.816 0.0772   10.58 0.001726 0.675 0.825 0.925  1998 1.000
## Sierra          0.723 0.2276    3.18 0.004201 0.333 0.667 1.000  2935 1.000
## Siskiyou        0.842 0.1004    8.39 0.001970 0.667 0.867 1.000  2595 0.999
## Solano          0.866 0.0634   13.67 0.001374 0.750 0.867 0.967  2125 1.002
## Sonoma          0.846 0.0620   13.64 0.001423 0.741 0.852 0.935  1900 1.000
## Stanislaus      0.812 0.0738   11.00 0.001822 0.681 0.824 0.901  1642 1.004
## Sutter          0.832 0.0904    9.20 0.001722 0.650 0.850 0.950  2758 1.000
## Tehama          0.842 0.1004    8.38 0.002095 0.647 0.882 1.000  2297 1.001
## Trinity         0.745 0.2051    3.63 0.004003 0.500 0.750 1.000  2624 1.000
## Tulare          0.831 0.0667   12.45 0.001492 0.709 0.836 0.927  2002 1.000
## Tuolumne        0.878 0.0968    9.07 0.002004 0.746 0.917 1.000  2333 0.999
## Ventura         0.840 0.0493   17.03 0.001061 0.758 0.845 0.913  2159 1.000
## Yolo            0.853 0.0736   11.59 0.001598 0.714 0.857 0.971  2125 1.001
## Yuba            0.793 0.1014    7.82 0.002037 0.632 0.789 0.947  2477 1.000

theta <- c(tapply(apipop$sch.wide == "Yes", apipop$cname, mean))  # true population quantities
plot_coef(summ, list(est=theta), n.se=2, est.names=c("mcmcsae", "true"), maxrows=30)

Basic unit-level models

Binomial Unit-Level Model

References