Conjugate Hierarchical Models

Paul Northrop

2020-02-24

The bang package simulates from the posterior distributions involved in certain Bayesian models. See the vignette Introducing bang: Bayesian Analysis, No Gibbs for an introduction. In this vignette we consider the Bayesian analysis of certain conjugate hierarchical models. We give only a brief outline of the structure of these models. For a full description see Chapter 5 of Gelman et al. (2014).

Suppose that, for \(j = 1, \ldots, J\), experiment \(j\) of \(J\) experiments yields a data vector \(Y_j\) and associated parameter vector \(\theta_j\). Conditional on \(\theta_j\) the data \(Y_j\) are assumed to follow independent response distributions from the exponential family of probability distributions. A prior distribution \(\pi(\theta_j \mid \phi)\) is placed on each of the population parameters \(\theta_1, \ldots, \theta_J\), where \(\phi\) is a vector of hyperparameters. For mathematical convenience \(\pi(\theta_j \mid \phi)\) is selected to be conditionally conjugate, that is, conditionally on \(\phi\) the posterior distribution of \(\theta_j\) of the same type as \(\pi(\theta_j \mid \phi)\).

Use of a conditionally conjugate prior means that it is possible to derive, and simulate from, the marginal posterior density \(\pi(\phi \mid \boldsymbol{\mathbf{y}})\). The hef function does this using the generalized ratio-of-uniforms method, implemented by the function ru in the rust package (Northrop 2017)). By default a model-specific transformation of the parameter vector \(\phi\) is used to improve efficiency. See the documentation of hef and the two examples below for details. Simulation from the full posterior density \(\pi(\boldsymbol{\mathbf{\theta}}, \phi \mid \boldsymbol{\mathbf{y}}) = \pi(\boldsymbol{\mathbf{\theta}} \mid \phi, \boldsymbol{\mathbf{y}}) \pi(\phi \mid \boldsymbol{\mathbf{y}})\) follows directly because conditional conjugacy means that it is simple to simulate from \(\pi(\boldsymbol{\mathbf{\theta}} \mid \phi, \boldsymbol{\mathbf{y}})\), given the values simulated from \(\pi(\phi \mid \boldsymbol{\mathbf{y}})\). The simulation is performed in the function hef.

Beta-binomial model

We consider the example presented in Section 5.3 of Gelman et al. (2014), in which the data (Tarone 1982) in the matrix rat are analysed. These data contain information about an experiment in which, for each of 71 groups of rats, the total number of rats in the group and the numbers of rats who develop a tumor is recorded, so that \(J = 71\). Conditional on \(\boldsymbol{\mathbf{\theta}} = (\theta_1, \ldots, \theta_J) = (p_1, \ldots, p_J)\) we assume independent binomial distributions for \((Y_1, \ldots, Y_J)\), that is, \(Y_j \sim {\rm binomial}(n_j, p_j)\). We use the conditionally conjugate priors \(p_j \sim {\rm Beta}(\alpha, \beta)\), so that \(\phi = (\alpha, \beta)\).

The conditional conjagacy of the priors means that the marginal posterior of \((\alpha, \beta)\) given \(\boldsymbol{\mathbf{y}} = (y_1, \ldots, y_J)\) can be determined (equation (5.8) of Gelman et al. (2014)) as \[ \pi(\alpha, \beta \mid \boldsymbol{\mathbf{y}}) \propto \pi(\alpha, \beta) \prod_{j=1}^J \frac{B(\alpha + y_j, \beta + n_j - y_j)}{B(\alpha, \beta)}, \] where \(\pi(\alpha, \beta)\) is the hyperprior density for \((\alpha, \beta)\). By default \(\phi = (\alpha, \beta)\) is transformed prior to sampling using \((\rho_1, \rho_2)=(\log (\alpha/\beta), \log (\alpha+\beta))\). The aim of this is to improve efficiency by rotating and scaling the (mode-relocated) conditional posterior density in an attempt to produce near circularity of this density’s contours.

To simulate from the full posterior density \(\pi(\boldsymbol{\mathbf{\theta}}, \alpha, \beta \mid \boldsymbol{\mathbf{y}}) = \pi(\boldsymbol{\mathbf{\theta}} \mid \alpha, \beta, \boldsymbol{\mathbf{y}}) \pi(\alpha, \beta \mid \boldsymbol{\mathbf{y}})\) we first sample from \(\pi(\alpha, \beta \mid \boldsymbol{\mathbf{y}})\). Simulation from the conditional posterior distribution of \(\boldsymbol{\mathbf{\theta}}\) given \((\alpha, \beta, \boldsymbol{\mathbf{y}})\) is then straightforward on noting that \[ \theta_j \mid \alpha, \beta, y_j \sim {\rm beta}(\alpha + y_j, \beta + n_j - y_j) \] and that \(\theta_j, j = 1, \ldots, J\) are conditionally independent.

The hyperprior for \((\alpha, \beta)\) used by default in hef is \(\pi(\alpha, \beta) \propto (\alpha+\beta)^{-5/2}, \alpha>0, \beta>0\), following Section 5.3 of Gelman et al. (2014). A user-defined prior may be set using set_user_prior.

library(bang)
# Default prior, sampling on (rotated) (log(mean), log(alpha + beta)) scale
rat_res <- hef(model = "beta_binom", data = rat, n = 10000)
plot(rat_res)
plot(rat_res, ru_scale = TRUE)

The plot on the left shows the values sampled from the posterior distribution of \((\alpha, \beta)\) with superimposed density contours. On the right is a similar plot displayed on the scale used for sampling, that is, \((\rho_1, \rho_2)\).

The following summary is of properties of the generalized ratio-of uniforms algorithm, in particular the probability of acceptance, and summary statistics of the posterior sample of \((\alpha, \beta)\).

summary(rat_res)
#> ru bounding box:  
#>                box       vals1       vals2 conv
#> a        1.0000000  0.00000000  0.00000000    0
#> b1minus -0.2382163 -0.40313465 -0.03906170    0
#> b2minus -0.2174510  0.05447431 -0.35297539    0
#> b1plus   0.2231876  0.36718411 -0.06551353    0
#> b2plus   0.2512577  0.05665707  0.44459818    0
#> 
#> estimated probability of acceptance:  
#> [1] 0.5258729
#> 
#> sample summary 
#>      alpha              beta       
#>  Min.   : 0.6152   Min.   : 3.883  
#>  1st Qu.: 1.7895   1st Qu.:10.675  
#>  Median : 2.2198   Median :13.329  
#>  Mean   : 2.4051   Mean   :14.341  
#>  3rd Qu.: 2.8034   3rd Qu.:16.830  
#>  Max.   :14.1474   Max.   :80.895

Gamma-Poisson Model

We perform a fully Bayesian analysis of an empirical Bayesian example presented in Section 4.2 of Gelfand and Smith (1990), who fix the hyperparameter \(\alpha\) described below at a point estimate derived from the data. The pump dataset (Gaver and O’Muircheartaigh 1987) is a matrix in which each row gives information about one of 10 different pump systems. The first column contains the number of pump failures. The second column contains the length of operating time, in thousands of hours.

pump
#>       failures    time
#>  [1,]        5  94.320
#>  [2,]        1  15.720
#>  [3,]        5  62.880
#>  [4,]       14 125.760
#>  [5,]        3   5.240
#>  [6,]       19  31.440
#>  [7,]        1   1.048
#>  [8,]        1   1.048
#>  [9,]        4   2.096
#> [10,]       22  10.480

The general setup is similar to the beta-binomial model described above but now the response distribution is taken to be Poisson, the prior distribution is gamma and \(J = 10\). For \(j = 1, \ldots, J\) let \(Y_j\) denote the number of failures and \(e_j\) the length of operating time for pump system \(j\). Conditional on \(\boldsymbol{\mathbf{\lambda}} = (\lambda_1, \ldots, \lambda_J)\) we assume independent Poisson distributions for \((Y_1, \ldots, Y_J)\) with means that are proportional to the exposure time \(e_j\), that is, \(Y_j \sim {\rm Poisson}(e_j \lambda_j)\). We use the conditionally conjugate priors \(\lambda_j \sim {\rm gamma}(\alpha, \beta)\), so that \(\phi = (\alpha, \beta)\). We use the parameterization where \(\beta\) is a rate parameter, so that \({\rm E}(\lambda_j) = \alpha/\beta\) a priori.

The marginal posterior of \((\alpha, \beta)\) given \(\boldsymbol{\mathbf{y}} = (y_1, \ldots, y_J)\) can be determined as \[ \pi(\alpha, \beta \mid \boldsymbol{\mathbf{y}}) \propto \pi(\alpha, \beta) \prod_{j=1}^J \frac{\beta^\alpha\Gamma(\alpha + y_j)}{(\beta+e_j)^{\alpha+y_j} \Gamma(\alpha)}, \] where \(\pi(\alpha, \beta)\) is the hyperprior density for \((\alpha, \beta)\). The scale used for sampling is \((\rho_1, \rho_2)=(\log (\alpha/\beta), \log \beta))\).

To simulate from the full posterior \(\pi(\boldsymbol{\mathbf{\lambda}}, \alpha, \beta \mid \boldsymbol{\mathbf{y}}) = \pi(\boldsymbol{\mathbf{\lambda}} \mid \alpha, \beta, \boldsymbol{\mathbf{y}}) \pi(\alpha, \beta \mid \boldsymbol{\mathbf{y}})\) we first sample from \(\pi(\alpha, \beta \mid \boldsymbol{\mathbf{y}})\) and then note that \[ \lambda_j \mid \alpha, \beta, y_j \sim {\rm gamma}(\alpha + y_j, \beta + e_j) \] and that \(\lambda_j, j = 1, \ldots, J\) are conditionally independent.

By default hef takes \(\alpha\) and \(\beta\) to be independent gamma random variables a priori. The parameters of these gamma distributions can be set by the user, using the argument hpars or a different prior may be set using set_user_prior.

We produce similar output to the beta-binomial example above.

pump_res <- hef(model = "gamma_pois", data = pump, hpars = c(1, 0.01, 1, 0.01))
plot(pump_res)
plot(pump_res, ru_scale = TRUE)
summary(pump_res)
#> ru bounding box:  
#>                box       vals1       vals2 conv
#> a        1.0000000  0.00000000  0.00000000    0
#> b1minus -0.5174980 -0.91869101 -0.06060116    0
#> b2minus -0.5150835  0.15757254 -0.92429417    0
#> b1plus   0.4124640  0.65433383 -0.11046433    0
#> b2plus   0.4224941  0.08788857  0.67847965    0
#> 
#> estimated probability of acceptance:  
#> [1] 0.5094244
#> 
#> sample summary 
#>      alpha             beta        
#>  Min.   :0.2007   Min.   :0.08544  
#>  1st Qu.:0.8185   1st Qu.:1.29559  
#>  Median :1.0598   Median :1.94558  
#>  Mean   :1.1496   Mean   :2.16025  
#>  3rd Qu.:1.3915   3rd Qu.:2.73560  
#>  Max.   :3.6629   Max.   :9.28544

References

Gaver, Donald P., and I. G. O’Muircheartaigh. 1987. “Robust Empirical Bayes Analyses of Event Rates.” Technometrics 29 (1). [Taylor & Francis, Ltd., American Statistical Association, American Society for Quality]: 1–15. http://www.jstor.org/stable/1269878.

Gelfand, Alan E., and Adrian F. M. Smith. 1990. “Sampling-Based Approaches to Calculating Marginal Densities.” Journal of the American Statistical Association 85 (410). [American Statistical Association, Taylor & Francis, Ltd.]: 398–409. http://www.jstor.org/stable/2289776.

Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2014. Bayesian Data Analysis. Third. Florida, USA: Chapman & Hall / CRC. http://www.stat.columbia.edu/~gelman/book/.

Northrop, P. J. 2017. rust: Ratio-of-Uniforms Simulation with Transformation. https://CRAN.R-project.org/package=rust.

Tarone, Robert E. 1982. “The Use of Historical Control Information in Testing for a Trend in Proportions.” Biometrics 38 (1). [Wiley, International Biometric Society]: 215–20. http://www.jstor.org/stable/2530304.