The R package singcar: Comparing Single Cases to Small Samples

Jonathan Ö. Rittmo

University of Edinburgh
j.rittmo@gmail.com

Robert D. McIntosh

University of Edinburgh
r.d.mcintosh@ed.ac.uk

Abstract

Comparison of single cases to populations estimated from a sample has many potential applications. Historically, one major application is in the field of neuropsychology, where single-case statistics may be used to infer whether an individual has suffered a significant cognitive deficit as the consequence of a brain lesion. One may wish to estimate whether that individual has abnormally low performance on some cognitive ability, or if one cognitive ability is abnormally discrepant with respect to another cognitive ability. Several statistical methods have been developed to test for abnormality on a single variate and abnormality of the difference between two variates when a single case is compared to a sample, without losing control of the Type I error rate. This paper describes some of the main methods and presents a package in which they are implemented.

1 Introduction

There are many reasons why researchers and clinicians might want to look at single cases instead of at the average of some group. In certain fields, such as neuropsychology, this need arises because the pattern of naturally-occurring brain damage will be unique in each individual case. From a theoretical perspective, this means that a single patient might be the only available source of data for a given phenomenon. From a practical, clinical perspective, diagnosis and description of the pattern of cognitive impairment is done at the individual level. Individual brain-damaged patients are thus often compared to the healthy population to assess changes in cognitive functioning. If we want to assess the patient score on some variate Y, for which we do not know the population parameters, these must be estimated from a sample. Thus, the single-case of interest is compared to a control sample. There are many other areas where the application of such methods could also be useful: for example studies of uncommon human expertise, targeted quality checks in industries with limited production output, or animal studies where rearing a large experimental group might be infeasible.

As it represents the canonical field for the application of these methods, the nomenclature of neuropsychology will here be adopted. An abnormally low score on a single variate will be referred to as a deficit, an important concept for clinical and basic neuropsychology alike. For the latter area another concept is also considered to be of cardinal importance: the ability to test for dissociation, which is taken to provide evidence for some degree of functional independence between two cognitive abilities. By charting dissociations, a cognitive architecture of the mind can be theorized (Shallice 1988).

During the last 20 years methods have been developed to estimate and test for both deficits and dissociations in the single case, while controlling the Type I error rate, mainly by John Crawford and Paul Garthwaite (e.g., Crawford and Howell 1998; Crawford, Garthwaite, and Ryan 2011; Crawford and Garthwaite 2007, 2002, 2005). They are available as standalone computer programs, only taking summary data as input, at https://homepages.abdn.ac.uk/j.crawford/pages/dept/psychom.htm. But the majority of these methods have not yet been implemented in any standard statistical environment. By doing so in the package singcar for the R environment (R Core Team 2020), our aim is to encourage and simplify their usage. Further, limiting Type II errors has not received as much attention as limiting Type I errors in single-case methodology (McIntosh and Rittmo 2020). By including novel tools for power calculations, our hope is to increase awareness of the inherently low power in this field as well as to aid in the planning and design of experiments.

The R package singcar contains seven functions to estimate a case’s abnormality compared to a normal population estimated from a small sample, three of them with regards to a single variate and four with regards to the discrepancy between two variates. Both frequentist and Bayesian methods are provided, all developed originally by Crawford and colleagues (Crawford and Howell 1998; Crawford, Garthwaite, and Ryan 2011; Crawford and Garthwaite 2007, 2002, 2005). Of special note for psychological research are the methods allowing the inclusion of covariates (Crawford, Garthwaite, and Ryan 2011) using Bayesian regression techniques (Section 2.3.3). These methods make matching the control sample to the case less cumbersome.

In Section 2 the implemented methods are described in detail and in Section 3 the package is described and usage exemplified with a dataset from a recent neuropsychological single case study.

2 Comparing a single case to small samples

2.1 Background

In the neuropsychological application of the methods to be presented, the variates of interest will be scores obtained by participants on tasks assessing relevant cognitive functions. The variates will thus often be referred to as task scores. Historically, it was not uncommon for researchers to compare single cases against a control population estimated from a sample by evaluating the case score as a \(Z\) score from the estimated distribution. The \(p\) value associated with this \(Z\) score would then be treated as an estimate of the case’s abnormality. A similar logic was sometimes applied for estimating the abnormality of the discrepancy between two tasks, using a method developed by Payne and Jones (1957): researchers calculated a \(Z\) score based on the case’s difference between two standardised variates divided by the standard deviation of the difference.

The \(Z\) score approach is of course problematic because it treats parameter estimations from a restricted control sample as if they were population parameters. This could be appropriate with very large samples, but control samples in neuropsychology are often small (sometimes < 10), and it is well known that the sampling distribution of the sample variance is right skewed for small sample sizes. Hence, underestimation of variance would be more probable than overestimation, inflating obtained \(Z\) scores and thus Type I errors if the Z scores were used for hypothesis testing (Crawford and Howell 1998).

The following sections describe methods that have been devised to allow for the evaluation of the abnormality of a case against a restricted control sample, whilst retaining appropriate control over the Type I error rate. These methods include frequentist approaches (Section 2.2) and Bayesian approaches (Section 2.3).

2.2 Frequentist approaches

An appropriate method for comparing a single observation to the mean of a sample was proposed within the biological sciences by Sokal and Rohlf (1981) (p. 227). It was popularized within neuropsychology by Crawford and Howell (1998), where its common application was as a (one-tailed) test of deficit (TD) to determine whether the score of a single case with brain damage was abnormally low (assuming that a low score indicates poorer performance) with respect to the mean score of a control sample. The \(t\) distribution is used to account for the underestimation of the sample variance. The basic approach is a modified two samples \(t\) test where the case simply is treated as a sample of size 1. The degrees of freedom for this distribution is \(n + 1 - 2 = n - 1\).

\[\begin{equation} t_{n-1} = \frac{y^* - \overline{y}}{s \sqrt{\frac{n + 1}{n}}} \tag{2.1} \label{eq:TD} \end{equation}\]

Where \(y^*\) is the case score, \(\overline{y}, \ s\) the sample mean and sample standard deviation respectively and \(n\) the size of the control sample. This method does not allow estimation of a notional patient population. However, since the question posed when using this test is about the probability of the case being part of the control population, the potential alternative affinity of the case is of no concern (Crawford and Howell 1998). This test of deficit provides transparent control of the Type I error rate Crawford and Garthwaite (2012). Moreover, the \(p\) value obtained constitutes an unbiased point estimate of the abnormality of the case, as demonstrated by Crawford and Garthwaite (2006). The simple proof for this is given below.

The proportion of controls that would score lower on a random variable \(Y\) than the case \(y^*\) is \[ \mathbb{P}[Y< y^*] \] Subtracting \(\overline{y}\) from both sides of the inequality and dividing by \(s \sqrt{\frac{n + 1}{n}}\) \[ \mathbb{P}[Y< y^*] =\mathbb{P}\left[\frac{Y-\overline{y}}{s \sqrt{\frac{n + 1}{n}}} < \frac{y^* - \overline{y}}{s \sqrt{\frac{n + 1}{n}}}\right] \] The quantity to the left of the inequality, i.e., \(\frac{y-\overline{y}}{s \sqrt{\frac{n + 1}{n}}}\) is \(t\) distributed with \(n-1\) degrees of freedom. Hence, \[ \mathbb{P}[Y< y^*] =\mathbb{P}\left[t_{n-1} < \frac{y^* - \overline{y}}{s \sqrt{\frac{n + 1}{n}}}\right] \] The quantity to the right of the inequality is the test statistic from Equation (2.1), hence \(\mathbb{P}[y< y^*]\) is the same as the \(p\) value obtained from the test of deficit. This fact also makes the construction of confidence intervals of this abnormality possible.

Crawford, Garthwaite, and Porter (2010) pointed out that, although the \(Z\) score is not appropriate for estimating the abnormality of a case when the control sample is small, it does provide a standardised effect size measure of abnormality, similar to Cohen’s d (Cohen 1988). Insensitivity to sample size is indeed a requirement for an effect size index and the proposed quantity was: \[\begin{equation} Z_{CC} = \frac{y^* - \overline{y}}{s} \tag{2.2} \label{eq:zcc} \end{equation}\] This is simply a \(Z\) score calculated from sample estimates. The subscript (CC) indicates that it relates to a “case-controls” comparison. The construction of confidence intervals for the point estimate of abnormality using the \(Z_{CC}\) index of effect size was described by Crawford and Garthwaite (2002).

Put \(p\) as the percentage of the population that would fall below a case score, then the \(1-\frac{\alpha}{2}\) confidence interval for \(p\) is constructed as follows: Let \(Z_{CC} = \frac{y^*-\overline{y}}{s}\) and \(y^* \neq \overline{y}\) then \(Z_{CC}\) comes from a non-central \(t\) distribution on \(n-1\) degrees of freedom. By deploying a search algorithm we find the value \(\delta_U\) for the non-centrality parameter (NCP) of a non-central \(t\) distribution on \(n-1\) degrees of freedom such that the \(100\frac{\alpha}{2}\) percentile equates \(Z_{CC}\sqrt{n}\) and similarly we find the NCP \(\delta_L\) of a non-central \(t\) distribution such that its \(100(1-\frac{\alpha}{2})\) percentile equates \(Z_{CC}\sqrt{n}\). The upper and lower boundaries for \(Z_{CC}\) are then given by: \[ Z_{CC_U} = \frac{\delta_U}{n}, \ \ Z_{CC_L} = \frac{\delta_L}{n} \] and the boundaries for \(p\) by: \[ p_U = \Phi\left(\frac{\delta_U}{n}\right), \ p_L = \Phi\left(\frac{\delta_L}{n}\right) \] Where \(\Phi\) is the CDF of the standard normal distribution. Note that the above equation is appropriate when the case score falls on the left side of the distribution. If it were to fall on the right side, then the upper and lower boundaries would be given by: \[ p_U = 1 - \Phi\left(\frac{\delta_L}{n}\right), \ p_L = 1 - \Phi\left(\frac{\delta_U}{n}\right) \]

As has been shown, estimating abnormality on a single variate is a simple matter. For estimating abnormality of discrepancy, when the normally distributed variates \(Y_1\) and \(Y_2\) can be compared without standardisation, the approach is equally simple and in fact identical to the test of deficit, Equation (2.1), except that the statistic is based on difference scores, taking the correlation between the variates into account (\(\rho_{12}\)). \[\begin{equation} t_{n-1} = \frac{(y^*_1 - \overline{y}_1) - (y^* _2 - \overline{y}_2) }{ \sqrt{(s^2_1 +s^2_2 -2s_1 s_2 \rho_{12})(\frac{n+1}{n})}} \tag{2.3} \label{eq:UDT} \end{equation}\] Construction of confidence intervals for abnormality estimates based on this unstandardised difference test (UDT) is done in an analogous way to that of the confidence intervals of the TD (Crawford and Garthwaite 2005). However, the test is somewhat limited in its usefulness, because it is only applicable if the two variates are measured on equivalent scales. Far more common, at least in neuropsychology, is the need to assess discrepancies between differently-scaled variates, which require standardisation to be comparable.

By standardising the variates (not taking sample size into account) we get an effect size measure similar to \(Z_{CC}\), Equation (2.2) (Crawford, Garthwaite, and Porter 2010): \[\begin{equation} Z_{DCC} = \frac{z^*_1 - z^*_2}{\sqrt{2-2\rho_{12}}} \tag{2.4} \label{eq:PJ} \end{equation}\] Where \(z_1^*\) and \(z_2^*\) are the standardised case scores on some task A and some task B and the subscript (DCC) indicates “discrepancy-case-controls.” This quantity was proposed by Payne and Jones (1957) as a significance test for discrepancies, but of course it is not appropriate for such a purpose if small samples are used: we cannot use the \(Z\) distribution for estimating the abnormality of the discrepancy for the same reason we cannot use \(Z_{CC}\) to test for a deficit.

When the case scores on variates \(Y_1\) and \(Y_2\) are estimated from small samples and they need to be standardised, it means that instead of estimating the discrepancy between two normally distributed variates we need to estimate the discrepancy between two \(t\) distributed variates. Because linear combinations of correlated \(t\) distributed random variates are not themselves \(t\) distributed this problem is non-trivial. The distribution of such difference scores was examined by Garthwaite and Crawford (2004). They used asymptotic expansion to find functions of the sample correlation that, if used as a denominator to the difference between the variates, would yield an asymptotically \(t\) distributed quantity. They found:

\[\begin{equation} \psi=\frac{\frac{(y^*_1-\overline{y}_1)}{s_{1}}-\frac{(y^*_2-\overline{y}_2)}{s_{2}}}{ \sqrt{ (\frac{n+1}{n}) \left( (2-2 \rho)+ \frac{2(1-\rho^{2})}{n-1}+ \frac{(5+c^{2})(1-\rho^{2})}{2(n-1)^{2}}+ \frac{\rho(1+c^{2})(1-\rho^{2})}{2(n-1)^{2}}\right) }} \end{equation}\]

Where \(\rho\) is the sample correlation between the tasks and \(c\) the critical two-tailed \(t\) value with \(n-1\) degrees of freedom. Garthwaite and Crawford (2004) demonstrated that \(\mathbb{P}[ \psi > c] \approx \mathbb{P}[t >c]\). To obtain a precise probability for \(\psi\), one solves for \(\psi = c\), which gives a quantity not dependent on a pre-specified critical value. \(\psi = c\) is a quadratic equation in \(c^2\), choosing the positive root of which yields:

\[\begin{align} \begin{split} c & = \sqrt{\frac{ -b + \sqrt{b^2 - 4ad}}{2a}}, \ \text{where} \\ a & = (1+r)(1-r^2), \\ b & = (1-r)[4(n-1)^2+4(1+r)(n-1)+(1+r)(5+r)], \\ d & = - 2\left[\frac{y^*_{1} - \overline{y}_1}{s_1}-\frac{y^*_2 -\overline{y}_2}{s_2}\right]^2\left(\frac{n(n-1)^2}{n+1}\right) \end{split} \tag{2.5} \label{eq:RSDT} \end{align}\]

Where \(p = \mathbb{P}[t_{n-1}>c]\) is the estimate of abnormality, and is used for significance testing. It should be noted that because \(c\) is quadratic and we choose the positive root, the resultant statistic cannot be negative. If it is required that the test statistic expresses the direction of a negative effect the correct sign must be imposed.

The quantity in Equation (2.5) is referred to as the “revised standardised difference test” . The name stems from it being preceded by a test similar to that of the unstandardised difference test (Equation (2.3)) but with standardised normal variates, developed by Crawford, Howell, and Garthwaite (1998). Crawford and Garthwaite (2005) show with Monte Carlo simulations that the RSDT is superior in controlling Type I errors compared to their earlier \(t\) test and the \(Z\) score method of Payne and Jones (1957).

Even for very small sample sizes of \(n=5\), RSDT was shown to barely exceed the specified 5% error rate. However, if a case lacks a true discrepancy between the variates but exhibits extreme scores on both of them (in the same direction), the error rate of the RSDT increases steeply. Crawford and Garthwaite (2007) showed that the RSDT starts to lose control of the error for task scores being more extreme than two standard deviations away from the mean on both variates, without them exhibiting a discrepancy. For task scores at 8 standard deviations from the mean, the Type I error rate of the RSDT was inflated to nearly 35%.

Another major drawback of the RSDT is that it has proved difficult to construct confidence limits on the point estimate of abnormality due to \(\psi\) only being approximately \(t\) distributed. To remedy this and to be able to provide an exact rather than just approximate estimate of abnormality Crawford and colleagues started looking into Bayesian methodology.

2.3 Bayesian approaches

In the frequentist framework, parameters are treated as fixed attributes of a population and their estimations are thought to converge to the true value across a series of trials. In the Bayesian framework, by contrast, parameters are treated as random variables with associated probability distributions. To estimate a parameter distribution, we can use any prior knowledge of that parameter to assign probabilities to possible values of the parameter, forming what is known as a prior distribution, or simply a prior. If no prior information is available we may want to use a non-informative prior, the most simple of which would assign equal probabilities to all possible parameter values. The prior is updated when new information is obtained to form what is called a posterior distribution. The posterior probability of a hypothesis (i.e., a specified value of the parameter) is calculated by using Bayes theorem. If we disregard the marginal probability of the data in Bayes theorem it can be rewritten as: \[\begin{equation*} posterior \ \propto \ likelihood \times prior \end{equation*}\]

What this is saying is that the posterior density of a hypothesis is proportional (\(\propto\)) to the likelihood of the data under that hypothesis times the prior probability of the hypothesis.

In many cases it is not feasible to calculate the posterior analytically and instead Monte Carlo methods are often used. These methods are mathematical algorithms that solve numerical problems by repeated sampling from distributions. The algorithms used differ depending on the problem at hand, but in general they are all building on rules of drawing random numbers based on the likelihood and the prior, and observing the distribution formed after a large number of iterations. The peak of such a distribution is often taken as an estimation of the parameter of interest and when using non-informative priors this often also corresponds to the maximum likelihood estimate. Hence, using a non-informative prior often yields estimations with frequentist properties, a feature that is useful if hypothesis testing is desired.

This was a requirement when Crawford and Garthwaite (2007) and Crawford, Garthwaite, and Ryan (2011) developed Bayesian versions of the tests described in Section 2.2, now implemented in singcar. The following sections describe the procedural details of these methods which are all based on Monte Carlo simulations. The steps presented closely follow those outlined in the original papers with just a few slight changes of notation.

2.3.1 The Bayesian test of deficit

The Bayesian test of deficit (BTD) allows us to obtain an estimate of \(p\) (and accompanying credible intervals), which is the proportion of controls that would obtain a value more extreme than the case on the variate of interest.

Assume a sample of \(n\) controls on which we measure some value \(y\) that is normally distributed with unknown mean \(\mu\) and unknown variance \(\sigma^2\). Let \(\overline{y}\) and \(s^2\) denote the sample mean and sample variance respectively and \(y^*\) the case score. Because \(\mu\) and \(\sigma^2\) are unknown, the prior distribution of \(\mu\) is conditioned on the prior distribution of \(\sigma^2\). A non-informative prior \(\mu | \sigma^2 \sim \mathcal{N}(0, \ \infty)\) is assumed. For the posterior we have that the marginal distribution of \(\frac{(n-1)s^2}{\sigma^2} \sim \chi^2_{n-1}\) and the posterior distribution for \(\mu|\sigma^2 \sim \mathcal{N}(\overline{y}, \sigma^2/n)\), see e.g., Gelman et al. (2013) (p. 45 and 65). To estimate \(p\), the following steps are iterated:

  1. Let \(\psi\) be a random draw from a \(\chi^2\)-distribution on \(n-1 \ df\). Then let \(\hat{\sigma}^2_{(i)} = \frac{(n-1)s^2}{\psi}\) be the estimation of \(\sigma^2\) for this iteration, hence the subscript \((i)\).

  2. Let \(z\) be a random draw from a standard normal distribution. Then let \(\hat{\mu}_{(i)}=\overline{y}+z\sqrt{(\hat{\sigma}_{(i)}^2/n)}\) be the estimate of \(\mu\) for this iteration.

  3. With estimates of \(\mu\) and \(\sigma\), \(p\) is calculated conditional on these estimates being the “correct” \(\mu\) and \(\sigma\). Let \(z^*_{(i)}= \frac{y^* - \hat{\mu}_{(i)}}{\sqrt{\hat{\sigma}_{(i)}^2}}\) be the standardised case score then \(\hat{p}_{(i)} =\Phi\left(z^*_{(i)}\right)\) or \(\hat{p}_{(i)} = 1-\Phi\left(z^*_{(i)}\right)\) depending on alternative hypothesis, is the estimate of \(p\) for this iteration. That is the probability of drawing a value more extreme than \(z^*_{(i)}\) from a standard normal distribution.

Repeating these steps a large number of times will yield a distribution of \(\hat{p}\), the mean of which is taken as the point estimate of \(p\) and used for hypothesis testing. Multiplied by 100 it gives a point estimate of the percentage of the control population expected to exhibit a more extreme score than the case. If repeated e.g., 1000 times, the 25th smallest and 25th largest \(\hat{p}_{(i)}\) would give the lower and upper boundaries of the 95% credible interval for \(p\). Similarly, the 25th smallest and 25th largest values of \(z^*_{(i)}\) would give the lower and upper boundaries of the 95% credible interval for \(Z_{CC}\) the point estimate of which is that of Equation (2.2). Crawford and Garthwaite (2007) show that this method yields converging results to that of the frequentist test of deficit, Equation (2.1).

2.3.2 The Bayesian standardised difference test

The Bayesian standardised difference test (BSDT) follow a similar procedure to that of BTD. However, we now assume a sample of \(n\) controls from which we obtain values on the variates \(Y_1\) and \(Y_2\) following a bivariate normal distribution and representing task A and task B. Let \(\overline{y}_1\) and \(\overline{y}_2\) denote the sample means and \[\begin{equation*} \pmb{A}=\begin{bmatrix} s^2_{1} & s_{12} \\ s_{12} & s^2_{2} \end{bmatrix} \end{equation*}\] the sample variance-covariance matrix, where \(s_{12}\) is the sample covariance, then \(\pmb{S} =\pmb{A}(n-1)\) is the sums of squares and cross products (SSCP) matrix. It is assumed that the observations come from a bivariate normal distribution with unknown mean \(\pmb{\mu}\) and unknown variance \(\Sigma\):

\[\begin{equation*} \pmb{\mu} = \begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix} \ \text{and} \ \Sigma=\begin{bmatrix} \sigma^2_{1} & \sigma_{12} \\ \sigma_{12} & \sigma^2_{2} \end{bmatrix} \end{equation*}\]

Let the case scores be denoted \(y_1^*\) and \(y_2^*\). Just as for the frequentist dissociation tests we want to estimate the proportion \(p\) of the control population that would exhibit a greater difference \(Y_1-Y_2\) than the case’s \(y_1^*-y_2^*\).

The multivariate generalization of the \(\chi^2\)-distribution is the Wishart distribution. That is, a Wishart distribution of 1 dimension is a \(\chi^2\)-distribution on \(n\) degrees of freedom. One can view the Wishart distribution parameterised with \(n\) degrees of freedom and some variance-covariance matrix as being a distribution of SSCP-matrices related to that variance-covariance matrix. Similarly, one can view the inverse Wishart distribution parameterised with the same degrees of freedom and a SSCP-matrix as being a distribution of variance-covariance matrices related to that SSCP-matrix.

The non-informative prior for \(f(\pmb\mu, \Sigma^{-1})\) chosen in Crawford and Garthwaite (2007) was \(f(\pmb\mu, \Sigma^{-1}) \propto |\Sigma|\) in favour of the perhaps more commonly used \(f(\pmb\mu, \Sigma^{-1}) \propto |\Sigma|^{(k+1)/2}\) (\(k =\) number of variates) because, for unstandardised differences, it was shown to yield identical interval estimates to the frequentist UDT. The posterior marginal distribution of \(\Sigma^{-1}\) for this choice of prior is a Wishart distribution with \(n\) degrees of freedom and scale matrix \(\pmb{S}^{-1}\). The conditional distribution of \(\pmb\mu|\Sigma\) is then a multivariate normal distribution with mean \([\overline{y}_1 \ \overline{y}_2]\) and variance \(\Sigma/n\), see e.g., Gelman et al. (2013) (p. 72-73). Crawford and Garthwaite (2007) refer to this as the “standard theory” prior.

Using the standard theory prior produces good frequentist properties for \(\sigma_1\) and \(\sigma_2\), but Berger and Sun (2008) have shown that the convergence to frequentist estimates is less good for \(\rho\) and differences in means (\(\mu_1 - \mu_2\)). Instead, they recommend that \(f(\pmb\mu, \Sigma^{-1}) \propto \frac{1}{\sigma_1\sigma_2(1-\rho^2)}\) should be used as a general purpose prior for bivariate normal data. To construct posteriors with this prior rejection sampling is used. Random observations are drawn from an inverse Wishart on \(n-1\) degrees of freedom and only a subset of the draws are accepted. Crawford, Garthwaite, and Ryan (2011) noticed however that this prior gave rise to too narrow credible intervals and, with simulations, showed that if the sample size was treated as \(n-1\) for estimations of \(\Sigma\), so that the intervals were somewhat conservative, the frequentist properties of their estimations were improved. Crawford, Garthwaite, and Ryan (2011) recommend this and refer to it as the “calibrated” prior. The procedure for sampling from the posterior using this prior is given below:

  1. Since the unbiased estimate of \(\Sigma\) is \(\pmb{S}/(n-1)\), we put \(\pmb{S}^*= (n-2)\pmb{S}/(n-1)\) to retain this property when reducing the sample size.

  2. Generate an observation from an inverse Wishart distribution on \(n-2\) degrees of freedom and with scale matrix \(\pmb{S}^*\). Denote this: \[ \hat{\Sigma} = \begin{bmatrix} \hat{\sigma}^2_{1} & \hat{\sigma}_{12} \\ \hat{\sigma}_{12} & \hat{\sigma}^2_{2} \end{bmatrix} \] and put \[ \hat{\rho}= \frac{\hat{\sigma}_{12}}{\sqrt{\hat{\sigma}^2_{1}\hat{\sigma}^2_{2}}} \]

  3. Generate a value \(u\) from a uniform distribution such that \(u \sim U(0, 1)\). If \(u^2 \leq 1-\hat{\rho}^2\) we accept \(\hat\Sigma\) as the estimation of \(\Sigma\) for this iteration and denote it \[\begin{equation*} \hat{\Sigma}_{(i)}=\begin{bmatrix} \hat{\sigma}^2_{1(i)} & \hat{\sigma}_{12(i)} \\ \hat{\sigma}_{12(i)} & \hat{\sigma}^2_{2(i)} \end{bmatrix} \end{equation*}\] otherwise we iterate the procedure until we have an accepted \(\hat{\Sigma}\).

Bayesians might argue that good frequentist properties are not necessary when conducting Bayesian analyses. If so, one can use the “standard theory” prior by generating a random observation from an inverse Wishart distribution on \(n\) degrees of freedom and scale matrix \(\pmb{S}\) and denote it \(\hat{\Sigma}_{(i)}\). With an estimate of \(\Sigma\) (regardless of the prior chosen), follow the steps below to obtain an estimate of \(p\), that is the percentage of the control population expected to show a more extreme score than the case.

  1. Let \(z_{r1}\) and \(z_{r2}\) be two random draws from a standard normal distribution. Perform Cholesky decomposition on \(\hat{\Sigma}_{(i)}\), that is finding the lower triangular matrix \(\pmb{T}\) such that \(\pmb{T}\pmb{T'}=\hat{\Sigma}_{(i)}\). Then \[\begin{equation*} \pmb{\hat{\mu}}_{(i)} = \begin{pmatrix} \hat{\mu}_{1(i)} \\ \hat{\mu}_{2(i)} \end{pmatrix} = \begin{pmatrix} \overline{y}_1 \\ \overline{y}_2 \end{pmatrix}+ \pmb{T} \begin{pmatrix} z_{r1} \\ z_{r2} \end{pmatrix} / \sqrt{n} \end{equation*}\] is the estimation of \(\pmb{\mu}\) for this iteration.

  2. With estimations of \(\pmb{\mu}\) and \(\Sigma\) we can calculate \(p\), given that they are the the correct values of \(\pmb{\mu}\) and \(\Sigma\). If an unstandardised test is required put: \[\begin{equation*} z_{(i)}^* = \frac{(y_1^* - \hat{\mu}_{1(i)}) - (y^*_2 - \hat{\mu}_{2(i)})} {\sqrt{\hat{\sigma}^2_{1(i)}+\hat{\sigma}^2_{2(i)}-2\hat{\sigma}_{12(i)}}} \end{equation*}\] If a standardised test is required, put: \[\begin{equation*} z_{1(i)} = \frac{y_1^* - \hat{\mu}_{1(i)}}{\sqrt{\hat{\sigma}^2_{1(i)}}}, \ z_{2(i)} = \frac{y_2^* - \hat{\mu}_{2(i)}}{\sqrt{\hat{\sigma}^2_{2(i)}}}, \ \hat{\rho}_{(i)} = \frac{\hat{\sigma}_{12(i)}}{\sqrt{\hat{\sigma}_{1(i)}\hat{\sigma}_{2(i)}}} \end{equation*}\] and \[\begin{equation*} z^*_{(i)} = \frac{z_{1(i)} - z_{2(i)}}{\sqrt{2-2\hat{\rho}_{(i)}}} \end{equation*}\]

  3. Let \(\hat{p}_{(i)}\) be the tail area of a standard normal distribution less or greater than \(z^*_{(i)}\) (depending on alternative hypothesis). \(\hat{p}_{(i)}\) is then the estimate of \(p\) for this iteration.

Repeating these steps a large number of times will yield a distribution of \(\hat{p}\), the mean of which is taken as the point estimate of \(p\) and used for hypothesis testing. Multiplied by 100 it gives the point estimate of the percentage of controls expected to exhibit a more extreme task difference than the case. If repeated e.g., 1000 times, the 25th smallest and 25th largest \(\hat{p}_i\) is the lower and upper boundaries of the 95% credible interval for \(p\). Similarly, the 25th smallest and 25th largest values of \(z^*_{(i)}\) gives the lower and upper boundaries of the 95% credible interval for \(Z_{DCC}\), the point estimate of which is that of Equation (2.4).

2.3.3 Bayesian tests allowing for covariates

Crawford, Garthwaite, and Ryan (2011) extended the previously described Bayesian tests by using Bayesian regression techniques that allowed the case’s abnormality to be assessed in the presence of covariates. These tests thus allow you to compare the case’s score on the task of interest conditioned on the results of the controls having the same score as the case on the covariate(s). If a case has 15 years of education, his/her score on the task would be compared to the controls with equal length of education.

In addition to improving the precision of these tests, by accounting for spurious noise, this also facilitates the collection of larger control samples, because it reduces the need to very closely match control samples to a single case. Moreover, it means that the same control sample may be used for the evaluation of multiple single-cases, since the matching on relevant covariates of interest can be achieved statistically. One degree of freedom is lost for each covariate included, so as a rule of thumb they should be included only if they have a sample correlation with the variate(s) of interest stronger than \(0.3\) (Crawford, Garthwaite, and Ryan 2011). Of course, the control sample should also ideally bracket the case on the covariates, in order to avoid extrapolating “out of sample.” As for the earlier tests, the assumption of the variates of interest are still that they follow a normal or bivariate normal distribution. However, no assumptions are made about the distribution of the covariates.

Suppose we have data from a sample of size \(n\), with scores on \(m\) covariates and \(k = 1, 2\) variates of interest, depending on whether we are testing for a deficit or a discrepancy. We denote the covariates \(\boldsymbol{X} = (X_1, \dots, X_m)\).

From these values we wish to estimate \[ \boldsymbol{B} = \begin{bmatrix} \boldsymbol{\beta}_1 & \cdots & \boldsymbol{\beta}_k \end{bmatrix} \] Where \(\boldsymbol{\beta}_i\) is a vector of length \(m+1\) containing regression coefficients for each covariate on the \(i\)th variate of interest, the first element in \(\boldsymbol{\beta}_i\) being the intercept. We also wish to estimate \[ \Sigma \ | \ \boldsymbol{X} = \begin{bmatrix} \sigma^2_1 \ | \ \boldsymbol{X} & \rho\sigma_1\sigma_2 \ | \ \boldsymbol{X} \\ \rho\sigma_1\sigma_2 \ | \ \boldsymbol{X} & \sigma^2_2 \ | \ \boldsymbol{X} \end{bmatrix} \] That is, the covariance matrix of the variates of interest conditioned upon the covariates, for task \(Y_1\) and task \(Y_2\) i.e., when \(k=2\). If we wish to test for a deficit, that is \(k = 1\) then \(\Sigma \ | \ \boldsymbol{X}\) is a \(1\times1\) matrix containing the conditional variance of the variate of interest. We assume \(\Sigma\) not to vary with the covariates.

The procedure will be outlined in the general case, that is for \(k = 1\) or \(k = 2\). The main difference between testing for abnormality on a single variate and testing for abnormal discrepancy between two variates is the recommended specification of the prior.

Let \(\boldsymbol{X}\) be the \(n \times m+1\) design matrix on which we regress \(\boldsymbol{Y}\), the \(n \times k\) response matrix. \[ \boldsymbol{X} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{1m} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n1} & \cdots & x_{nm} \end{bmatrix},\ \ \boldsymbol{Y} = \begin{bmatrix} y_{11} & \cdots & y_{1k} \\ \vdots & \ddots & \vdots \\ y_{n1} & \cdots & y_{nk} \end{bmatrix} \] The data estimates of \(\boldsymbol{B}\) and \(\Sigma\) are then \[ \boldsymbol{B}^* = (\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{Y} \ \text{and} \ \Sigma^* = \frac{1}{n-m-1}(\boldsymbol{Y}-\boldsymbol{X}\boldsymbol{B}^*)'(\boldsymbol{Y}-\boldsymbol{X}\boldsymbol{B}^*) \] For the “standard theory” prior the posterior of \(\Sigma\) is an inverse Wishart distribution with \(df = n - m\) when \(k=2\) and \(df=n-m-1\) when \(k = 1\) Tiao and Zellner (1964) and scale matrix \((n-m-1)\Sigma^*\). For each iteration we generate an estimate of \(\Sigma\) from this distribution. For the “calibrated” prior (only used for \(k=2\)) the steps described in 2.3.2 are followed and we use \(\pmb{S}^* = (n-m-2)\pmb{S}/(n-m-1)\) as the scale matrix when we sample from an inverse Wishart distribution on \((n-m-2)\) degrees of freedom, where \(\pmb{S} = (n-m-1)\Sigma^*\). The generated observation from either method is denoted \(\hat{\Sigma}_{(i)}\). We have

\[\begin{equation} \hat{\Sigma}_{(i)}=[\hat{s}^2_{(i)}] \ \text{when} \ k=1 \ \text{and} \ \hat{\Sigma}_{(i)} = \begin{bmatrix} \hat{s}^2_{1(i)} & \hat{s}_{12(i)} \\ \hat{s}_{12(i)} & \hat{s}^2_{2(i)} \end{bmatrix} \ \text{when} \ k=2 \ \tag{2.6} \label{eq:sigma} \end{equation}\]

\(\boldsymbol{B}^*\) will be an \((m+1) \times k\) matrix. We turn this into a \(k(m + 1) \times 1\) vector by concatenating the colums of regression coefficients in \(\boldsymbol{B}^*\), such that \(\boldsymbol{B}^*_{\text{vec}}=(\boldsymbol{\beta}^{*'}_1, ...,\boldsymbol{\beta}^{*'}_k)'\). Then this vector is the data estimate for \(\boldsymbol{B}_{\text{vec}}=(\boldsymbol{\beta}'_1, ...,\boldsymbol{\beta}'_k)'\). Take the kronecker product \(\boldsymbol{\hat\Sigma}_{(i)} \otimes (\boldsymbol{X'X})^{-1}\) and denote this \(\boldsymbol{\Lambda}_{(i)}\). Then the posterior of \(\boldsymbol{B}_{\text{vec}}|\boldsymbol{\hat\Sigma}_{(i)}\) is a multivariate normal distribution with mean vector \(\boldsymbol{B}^*_{\text{vec}}\) and variance-covariance matrix \(\boldsymbol{\Lambda}_{(i)}\). Draw a random value from this distribution such that \(\hat{\boldsymbol{B}}^*_{\text{vec(i)}} =(\hat{\boldsymbol{\beta}}'_{1(i)}, ...,\hat{\boldsymbol{\beta}}'_{k(i)})'\) and \(\hat{\boldsymbol{B}}_{(i)} = (\hat{\boldsymbol{\beta}}_{1(i)}, ...,\hat{\boldsymbol{\beta}}_{k(i)})\), where each \(\hat{\boldsymbol{\beta}}_{j(i)}\) is a vector of length \(m+1\) and \(\hat{\boldsymbol{B}}_{(i)}\) an \(m+1 \times k\) matrix.

Let \(\boldsymbol{x}^*\) be a vector of the case’s values on the covariates. The conditional values for the case on the tasks of interest is then \[ \hat{\boldsymbol{\mu}}_{(i)} = \hat{\boldsymbol{B}}_{(i)}\boldsymbol{x}^* \]

We now estimate the effect size of the case using the conditional estimations derived above. Depending on whether we are testing for a deficit or a discrepancy (i.e., whether \(k=1\) or \(k=2\)) the calculations of the effect sizes differ. Crawford, Garthwaite, and Ryan (2011) call these effect sizes \(Z_{CCC}\) and \(Z_{DCCC}\) for a deficit and a discrepancy respectively. They are similar to \(Z_{CC}\) (Equation (2.2)) and \(Z_{DCC}\) (Equation (2.4)), however, the extra \(C\) in the subscript indicates that they are conditional on covariates. Denote \(y^*_1\) and \(y^*_2\) as the case’s scores on the two variates of interest. Given that we want to estimate deficiency of a case on \(Y_1\), we calculate \[\begin{equation} \hat{Z}_{CCC(i)} = \frac{y^*_1-\hat{\mu}_{(i)}}{\hat{s}^2_{(i)}} \tag{2.7} \label{eq:zccc} \end{equation}\] For \(Z_{DCCC}\) we have two conditional means which will be denoted \(\hat{\mu}_{1(i)}\) and \(\hat{\mu}_{2(i)}\) for \(Y_1\) and \(Y_2\) respectively. We then calculate \[\begin{equation} \hat{Z}_{DCCC(i)} = \frac{\frac{y^*_1-\hat{\mu}_{1(i)}}{\hat{s}_{1(i)}}-\frac{y^*_2-\hat{\mu}_{2(i)}} {\hat{s}_{2(i)}}}{\sqrt{2-2\hat{\rho}_{12(i)}}} \tag{2.8} \label{eq:zdccc} \end{equation}\] Where \(\hat{s}_{1(i)}\) and \(\hat{s}_{2(i)}\) are conditional standard deviations obtained from \(\hat{\Sigma}_{(i)}\) in Equation (2.6). The conditional correlation between \(Y_1\) and \(Y_2\) in the denominator is given by \[\hat{\rho}_{12(i)} = \frac{\hat{s}_{12(i)}}{\sqrt{\hat{s}^2_{1(i)}\hat{s}^2_{2(i)}}}\]

We then find the tail area under the standard normal distribution that is less or greater than \(\hat{Z}_{CCC(i)}\) or \(\hat{Z}_{DCCC(i)}\) depending on the problem at hand and the alternative hypothesis specified. Denote the value obtained \(\hat{p}_{(i)}\) which then is an estimate of \(p\). If testing for a deficit we would have: \[ \hat{p}_{(i)}= \Phi(\hat{Z}_{CCC(i)}) \]

Iterating these steps a large number of times will yield a distribution of \(\hat{p}\), the mean of which is taken as the point estimate of \(p\). This estimate is used for significance testing and if multiplied by 100 it will give an estimate of the percentage of controls expected to exhibit a more extreme score than the case. If repeated e.g., 1000 times, the 25th smallest and 25th largest \(\hat{p}_{(i)}\) is the lower and upper boundaries of the 95% credible interval for \(p\).

To obtain a point estimate of \(Z_{CCC}\) and \(Z_{DCCC}\) we use Equation (2.7) and (2.8), but use the conditional means, standard deviations and correlation calculated directly from the control sample. The \(1-\alpha\) credible intervals for these effect sizes are given by the \(\alpha/2\) and \(1-\alpha/2\) quantiles of the sampled distribution of \(Z_{CCC}\) or \(Z_{DCCC}\). That is, just as for \(p\), if repeated e.g., 1000 times, the 25th smallest and 25th largest value of \(\hat{Z}_{CCC(i)}\) or \(\hat{Z}_{DCCC(i)}\) is the lower and upper boundaries of the 95% credible interval for \(Z_{CCC}\) or \(Z_{DCCC}\).

3 The singcar package

3.1 Functions for testing abnormality

Main functions for significance testing of abnormality when using small control samples and principal relevant reference in prior literature.
Name Function Source Reference
Test of deficit TD() Crawford and Howell (1998) Equation (2.1)
Bayesian test of deficit BTD() Crawford and Garthwaite (2007) Section 2.3.1
Bayesian test of deficit with covariates BTD_cov() Crawford, Garthwaite, and Ryan (2011) Section 2.3.3
Unstandardised difference test UDT() Crawford and Garthwaite (2005) Equation (2.3)
Revised standardised difference test RSDT() Crawford and Garthwaite (2005) Equation (2.5)
Bayesian standardised difference test BSDT() Crawford and Garthwaite (2007) Section 2.3.2
Bayesian standardised difference test with covariates BSDT_cov() Crawford, Garthwaite, and Ryan (2011) Section 2.3.3

To facilitate meta-analyses of studies using case-control comparisons, all functions in the table above can take both summary statistics as input as well as raw data. The output will be a list set to class "htest", for which the generic function print have a method. To use the package, begin by installing and loading it:

install.packages("singcar")
library("singcar")

3.2 Example from neuropsychology

The package comes with the dataset size_weight_illusion, a neuropsychological dataset from an investigation of the size-weight illusion in DF, a patient with visual form agnosia following bilateral lesions to the lateral occipital complex (Hassan et al. 2020). The size-weight illusion is a perceptual phenomenon in which smaller objects are perceived as heavier during lifting than larger objects of equal weight (Buckingham 2014). The illusion implies that sensory cues about object size affect the perception of weight. It has been suggested that patient DF does not experience this illusion in the same way as the healthy population when only visual cues about object size are available (Dijkerman et al. 2004). In contrast, when kinaesthetic (tactile) cues are provided it is suggested that DF’s experience of the illusion is unaffected by her brain damage. In other words, patient DF was expected to have a deficit in the visual size-weight illusion and exhibit an abnormally large discrepancy (dissociation) between visual and kinaesthetic size-weight illusions. The dataset consists of data from patient DF and 28 control participants, with the variables sex, age and visual as well as kinaesthetic size-weight illusion. The measure of the size-weight illusion is a scaled measure expressing the number of grams weight difference perceived per cubic cm of volume change. Below follows examples of how to analyse this dataset using the tests provided in singcar.

head(size_weight_illusion)
##   GROUP PPT    SEX YRS      V_SWI      K_SWI
## 1    SC  DF Female  65 0.02814921 0.10012712
## 2    HC E01 Female  70 0.21692634 0.21792930
## 3    HC E02   Male  64 0.17223226 0.26338899
## 4    HC E03 Female  63 0.07138049 0.09331700
## 5    HC E04 Female  65 0.10186453 0.25938045
## 6    HC E05 Female  66 0.15911439 0.08922615

3.2.1 Testing for a deficit

The simplest way to test for an abnormality on a single variate is to use the frequentist test of deficit. Start by extracting patient (patient DF is the first observation) and control data from the relevant variate, in this case the visual size-weight illusion:

PAT_VSWI <- size_weight_illusion[1, "V_SWI"]
CON_VSWI <- size_weight_illusion[-1, "V_SWI"] 

Using the function TD() we then apply the formula in Equation (2.1). The argument conf_int_spec specifies how fine grained the search algorithm for the confidence interval should be. The arguments sd and sample_size can be given if the test should be based on summary statistics rather than raw data, the controls argument should then be the mean of the control sample.

TD(case = PAT_VSWI,
   controls = CON_VSWI,
   sd = NULL,
   sample_size = NULL,
   alternative = "less",
   conf_int = TRUE,
   conf_level = 0.95,
   conf_int_spec = 0.01,
   na.rm = FALSE)
## 
##  Test of Deficit
## 
## data:  Case = 0.03, Controls (m = 0.16, sd = 0.08, n = 28)
## t = -1.7243, df = 27, p-value = 0.04804
## alternative hypothesis: true diff. between case and controls is less than 0
## sample estimates:
##   Std. case score (Z-CC), 95% CI [-2.34, -1.15] 
##                                       -1.754857 
## Proportion below case (%), 95% CI [0.95, 12.47] 
##                                        4.804003

This can similarly be tested with the Bayesian analogue which has a very similar syntax. This test yields an output that converges to that of TD as the argument for the number of iterations (iter) increase. The degrees of freedom shown in the output below is the degrees of freedom for the \(\chi^2\) distribution from which we sample \(\psi\), as described in Section 2.3.1.

set.seed(42)
BTD(case = PAT_VSWI,
    controls = CON_VSWI,
    sd = NULL,
    sample_size = NULL,
    alternative = "less",
    int_level = 0.95,
    iter = 10000,
    na.rm = FALSE)
## 
##  Bayesian Test of Deficit
## 
## data:  Case = 0.03, Controls (m = 0.16, sd = 0.08, n = 28)
## df = 27, p-value = 0.04821
## alternative hypothesis: true diff. between case and controls is less than 0
## sample estimates:
##   Std. case score (Z-CC), 95% CI [-2.35, -1.15] 
##                                       -1.754857 
## Proportion below case (%), 95% CI [0.94, 12.60] 
##                                        4.821283

If the control sample for a study is not appropriately matched to the case on variables such as, for example, age or education level it is appropriate to use tests that account for this by allowing for the inclusion of covariates. Including theoretically sound covariates is often a good idea. Crawford, Garthwaite, and Ryan (2011) recommends however to only include a covariate if it correlates \(\geq 0.3\) with the variate of interest, because of the loss of degrees of freedom.

The function BTD_cov() allows for the inclusion of covariates and therefore to assess the patient on the task of interest by essentially comparing him/her to the controls with the same score on the covariate. Even though the correlation between age and visual size-weight illusion is \(< 0.3\) it is included here as a coviariate for demonstrative purposes. Start again by extracting the scores on the covariate for the patient and for the control participants.

PAT_age <- size_weight_illusion[1, "YRS"] 
CON_age <- size_weight_illusion[-1, "YRS"]

Since BTD_cov() is somewhat computationally intense, the number of iterations has been reduced in the example compared to BTD(). For actual analysis the number of iterations should be based on required precision. It should be noted that there is no restriction on the number of covariates used as long as \(n > m+1\). If more than one covariate is used the case scores should be given as a vector of values and the control scores should be given as a data frame or matrix with the same number of columns as the number of values in the covariate vector of the case.

If summary statistics are used instead of raw data, the argument use_sumstats must be set to TRUE and the correlation matrix for the covariates and variate of interest must be given as well as the sample size. In addition, the control_covar argument must be supplied as an \(m \times 2\) matrix or data frame giving the mean of the covariate(s) in the first column and the standard deviation in the second. The degrees of freedom shown in the output below is the degrees of freedom for the inverse Wishart distribution from which we sample \(\psi\), as described in Section 2.3.3.

BTD_cov(case_task = PAT_VSWI,
        case_covar = PAT_age,
        control_task = CON_VSWI,
        control_covar = CON_age,
        alternative = "less",
        int_level = 0.95,
        iter = 1000,
        use_sumstats = FALSE,
        cor_mat = NULL,
        sample_size = NULL)
## 
##  Bayesian Test of Deficit with Covariates
## 
## data:  Case = 0.03, Controls (m = 0.16, sd = 0.08, n = 28)
## df = 26, p-value = 0.05173
## alternative hypothesis: true diff. between case and controls is less than 0
## sample estimates:
##  Std. case score (Z-CCC), 95% CI [-2.30, -1.10] 
##                                       -1.749556 
## Proportion below case (%), 95% CI [1.08, 13.47] 
##                                        5.172973

3.2.2 Testing for a dissociation

For assessing abnormal discrepancy between two variates the simplest function to use is the unstandardised difference test, Equation (2.3). This test is in singcarcalled by the function UDT(). However, one should use this only if the variates are known to come from equivalent distributions. Otherwise, tests that can evaluate standardised scores without inflating Type I errors should be used. In the frequentist framework the appropriate test for this is the RSDT, Equation (2.5). In this example we wish to estimate and test the abnormality of the discrepancy between visual and kinaesthetic size-weight illusion in patient DF. That is, we want to compare the difference between the variates exhibited by the patient and the distribution of differences in the healthy control sample. Again, start by extracting the patient and control scores for the second variate of interest.

PAT_KSWI <- size_weight_illusion[1, "K_SWI"] 
CON_KSWI <- size_weight_illusion[-1, "K_SWI"] 

Using the function RSDT() we then apply the formula in Equation (2.5). This test is most often used two-sided due to the fact the the sign of the discrepancy solely depends on the order of the input. This function does, however, not provide any confidence intervals. The syntax of UDT() is very similar to that of RSDT(), the main difference being options for confidence intervals as shown for TD(). If summary statistics are used then the additional argument r_ab, which is the sample correlation, must be set as well as the sample size, standard deviation and mean for both variates.

RSDT(case_a = PAT_VSWI,
     case_b = PAT_KSWI,
     controls_a = CON_VSWI,
     controls_b = CON_KSWI,
     sd_a = NULL,
     sd_b = NULL,
     sample_size = NULL,
     r_ab = NULL,
     alternative = "two.sided",
     na.rm = FALSE)
## 
##  Revised Standardised Difference Test
## 
## data:  Case A: 0.03, B: 0.10, Ctrl. A (m, sd): (0.16, 0.08), B: (0.18, 0.10)
## approx. abs. t = 1.015, df = 27, p-value = 0.3191
## alternative hypothesis: true difference between tasks is not equal to 0
## sample estimates:
## Std. case score, task A (Z-CC) Std. case score, task B (Z-CC) 
##                     -1.7548574                     -0.7836956 
##  Std. task discrepancy (Z-DCC)      Proportion below case (%) 
##                     -1.0647889                     15.9560625

The Bayesian analogue of this test is recommended over the RSDT because it keeps a better control over Type I errors when the case exhibits extreme deficits on both variates but no discrepancy between them (Crawford and Garthwaite 2007). The syntax of the two functions is similar, but the BSDT() comes with more optional arguments. For example, one has the option of applying this test without standardising the variates by setting the argument unstandardised to TRUE. The output then converges to that of the frequentist UDT. Furthermore, one can choose between priors. Setting the argument calibrated to FALSE specifies the use of the “standard theory” prior. If left to the default (TRUE) an accept-reject algorithm is deployed for each simulation of \(\Sigma\), as described in Section 2.3.2. This default prior has been shown to have better frequentist properties for estimating \(\rho\) and differences between means in a bivariate normal distributions (Berger and Sun 2008) and is therefore the default and recommended choice.

BSDT(case_a = PAT_VSWI,
     case_b = PAT_KSWI,
     controls_a = CON_VSWI,
     controls_b = CON_KSWI,
     sd_a = NULL,
     sd_b = NULL,
     sample_size = NULL,
     r_ab = NULL,
     alternative = "two.sided",
     int_level = 0.95,
     iter = 10000,
     unstandardised = FALSE,
     calibrated = TRUE,
     na.rm = FALSE)
## 
##  Bayesian Standardised Difference Test
## 
## data:  Case A: 0.03, B: 0.10, Ctrl. A (m, sd): (0.16,0.08), B: (0.18,0.10)
## df = 26, p-value = 0.3245
## alternative hypothesis: true difference between tasks is not equal to 0
## sample estimates:
##                  Std. case score, task A (Z-CC) 
##                                      -1.7548574 
##                  Std. case score, task B (Z-CC) 
##                                      -0.7836956 
## Std. discrepancy (Z-DCC), 95% CI [-1.69, -0.42] 
##                                      -1.0647889 
## Proportion below case (%), 95% CI [4.51, 33.81] 
##                                      16.2259126

If analysing discrepancies between variates in the presence of covariates the syntax is slightly different, requiring that one specifies the case’s scores on the variates of interest as a vector and the control scores as a data frame or matrix. One has the option to choose between the “calibrated” and “standard theory” prior, just as for BSDT(), where calibrated = TRUE is the recommended and default behaviour. If using summary statistics use_sumstats must be set to TRUE and the summary input should be supplied to the arguments control_tasks/control_covar as data frames or matrices with the means of each variable represented by the first column and the standard deviations by the second. The case’s scores should be supplied as vectors.

BSDT_cov(case_tasks = c(PAT_VSWI, PAT_KSWI),
         case_covar = PAT_age,
         control_tasks = cbind(CON_VSWI, CON_KSWI),
         control_covar = CON_age,
         alternative = "two.sided",
         int_level = 0.95,
         calibrated = TRUE,
         iter = 1000,
         use_sumstats = FALSE,
         cor_mat = NULL,
         sample_size = NULL)
## 
##  Bayesian Standardised Difference Test with Covariates
## 
## data:  Case A = 0.03, B = 0.10, Ctrl. (m, sd) A: (0.16,0.08), B: (0.18,0.10)
## df = 25, p-value = 0.3383
## alternative hypothesis: true difference between tasks is not equal to 0
## sample estimates:
##                   Std. case score, task A (Z-CC) 
##                                        -1.754857 
##                   Std. case score, task B (Z-CC) 
##                                        -0.783696 
## Std. discrepancy (Z-DCCC), 95% CI [-1.60, -0.38] 
##                                        -1.064152 
##  Proportion below case (%), 95% CI [5.53, 35.32] 
##                                        16.920000

3.2.3 Power calculators

A further capacity of singcaris that it can be used to calculate statistical power of the tests. The notion of power when comparing cases to control samples have been somewhat overlooked for this class of statistical tests. In recent work (McIntosh and Rittmo 2020), we argued that, even though power is inherently limited in this paradigm, a priori calculations are still useful for study design and interpretation in neuropsychological and other applications. Calculating power for the test of deficit is similar to calculating power for any \(t\) test and can be done analytically. \[\begin{equation} power = 1 - \beta = T_{n-1}\left(t_{\alpha, \ n-1} \Bigg\rvert \frac{x^* - \overline{x}}{\sigma \sqrt{\frac{n+1}{n}}}\right) \tag{3.1} \label{eq:TDpower} \end{equation}\] Where \(T_{n-1}(.\rvert \theta)\) is the cumulative distribution function for the non-central \(t\) distribution with \(n-1\) degrees of freedom and non-centrality parameter \(\frac{y^* - \overline{y}}{\sigma \sqrt{\frac{n+1}{n}}}\) (i.e., TD, Equation (2.1) and \(t_{\alpha, \ n-1}\) is the \(\alpha\) quantile of the \(t\) distribution on \(n-1\) degrees of freedom (note that this is for a one-sided test). For the unstandardised difference test power is calculated in an analogous way by putting Equation (2.3) as the non-centrality parameter. Deriving power for the other functions in an analytic manner is however not possible (the RSDT is only approximately \(t\) distributed) and a Monte Carlo approach has been used for these tests. To call any power calculator in the package one simply uses the function names with _power added as a suffix.

So, for example, to calculate power for the test of deficit we call TD_power(). The expected case score and either sample size or desired power must be supplied. The mean and standard deviation of the control sample can also be specified with the arguments mean and sd. If not, they take the default values of 0 and 1 respectively so that the case score is interpreted as distance from the mean in standard deviations. A conventional \(\alpha\)-level of \(0.05\) is assumed if nothing else is supplied. The alternative hypothesis can also be specified by the argument alternative: specify "less" (default) or "greater" for a one-tailed test, specify "two.sided" for a two-tailed test.

TD_power(case = 70,
         mean = 100,
         sd = 15,
         sample_size = 16,
         power = NULL,
         alternative = "less",
         alpha = 0.05,
         spec = 0.005)
## [1] 0.5819579

TD_power() can also be used to calculate required sample size for a desired level of power. For example, if we specify a desired power level of 0.6, leave sample_size to its default and let the rest of the arguments be as in the previous example, we see from the output of the function that power will not increase more than 0.5% for any additional participant after a sample size of 15. That is, the algorithm stops searching when this level of specificity has been reached and we are nearing the asymptotic maximum power for this effect size. We can increase the specificity by lowering the spec argument.

TD_power(case = 70,
         mean = 100,
         sd = 15,
         sample_size = NULL,
         power = 0.6,
         alternative = "less",
         alpha = 0.05,
         spec = 0.005)
## Power (0.578) will not increase more than 0.5% per participant for n > 15
##    n     power
## 1 15 0.5780555

Power calculators for the Bayesian tests of deficit cannot calculate required sample size. This is because they rely on simulation methods to estimate approximate power and deploying a search algorithm to find the required sample size for a given level of power would be computationally too intense. The syntax is otherwise relatively similar to that of TD_power(). For BTD_power() we have the two extra arguments nsim and iter, indicating the number of simulations used in the power function and by BTD(), respectively.

BTD_power(case = 70,
         mean = 100,
         sd = 15,
         sample_size = 15,
         alternative = "less",
         alpha = 0.05,
         nsim = 1000,
         iter = 1000)
## [1] 0.593

The only difference in syntax of BTD_cov_power() is due to the inclusion of covariates. The variate of interest must be specified as a vector where the first element gives the mean and the second the standard deviation in the argument control_task. The covariates can be specified similarly or as an \(m \times 2\) matrix where the first column gives the means of each covariate and the second column gives the standard deviations. The correlation matrix of the variates must be given as well. In the example below, power is evaluated for a test taking two covariates into account, both with a mean of 0 and a standard deviation of 1. The correlation is specified as a \(3\times 3\) matrix with pairwise correlations of \(0.3\). The default settings include only one covariate having a \(0.3\) correlation with the variate of interest. This function is computationally intense and hence, the number of simulations has, for the example below, been decreased.

covars <- matrix(c(0, 1,
                   0, 1), ncol = 2, byrow = TRUE)
BTD_cov_power(case = -2,
              case_cov = c(0.2, -0.6),
              control_task = c(0, 1),
              control_covar = covars,
              cor_mat = diag(3) + 0.3 - diag(c(0.3, 0.3, 0.3)),
              sample_size = 15,
              alternative = "less",
              alpha = 0.05,
              nsim = 100,
              iter = 100)
## [1] 0.5

For the difference tests one must supply the expected case scores on both variates as well as sample size. The means and standard deviations of the control sample can also be specified. If unspecified, they take on the default values of 0 and 1 respectively, so that the expected case scores are interpreted as distances from the means in standard deviations. RSDT_power(), BSDT_power() and UDT_power() additionally require an estimate of the sample correlation between the variates of interest, r_ab. If this is not specified a correlation of 0.5 is assumed by default. For BSDT_cov_power() the correlation matrix between the variates of interest and the covariates must instead be supplied (i.e., at least a \(3\times3\) matrix where the first correlation is the correlation between the variates of interest).

The alternative hypothesis is by default assumed to be "two.sided" since the direction of the effect is dependent on the order of the inputs, but can be specified to be "less" or "greater" as well. The syntax is similar for all three functions but with small differences. For UDT_power() one can request required sample size for a desired power, as for TD_power(). Calculators for the Bayesian tests have the extra argument calibrated as to be able to specify the prior. BSDT_cov_power() requires input in the same format as BTD_cov_power() for both control_tasks and control_covar. The two examples below demonstrate usage for RSDT_power() and BSDT_cov_power().

RSDT_power(case_a = 70,
           case_b = 55,
           mean_a = 100,
           mean_b = 50,
           sd_a = 15,
           sd_b = 10,
           r_ab = 0.5,
           sample_size = 15,
           alternative = "two.sided",
           alpha = 0.05,
           nsim = 1000)
## [1] 0.607
cor_mat <- matrix(c(1,   0.5, 0.6,
                    0.5,   1, 0.3,
                    0.6, 0.3,   1), ncol = 3, byrow = TRUE)
BSDT_cov_power(case_tasks = c(70, 55),
               case_cov = 65,
               control_tasks = matrix(c(100, 15,
                                        50, 10), ncol = 2, byrow = TRUE),
               control_covar = c(50, 25),
               cor_mat = cor_mat,
               sample_size = 15,
               alternative = "two.sided",
               alpha = 0.05,
               nsim = 100,
               iter = 100,
               calibrated = TRUE)
## [1] 0.74

4 Summary

The singcarpackage forR has been outlined and its main functionalities described. The package consists of methods for estimating abnormality of a single case when compared to a population estimated by a small sample. Methods for estimating abnormality on a single variate have been described as well as methods for estimating abnormality of the difference between two variates. Historically, the use of these tests has mainly been within the field of neuropsychology, but their potential applicability is far wider, extending to other areas of psychology and perhaps even to completely new fields.

In neuropsychology the methods developed by John Crawford and Paul Garthwaite are frequently used, especially the test of deficit, Equation (2.1), and the revised standardised difference test, Equation (2.5). However, the Bayesian standardised difference test, which outperforms the RSDT regarding control of Type I errors, has not gained as much traction. Neither have the more flexible Bayesian methods allowing for the inclusion of covariates. It is hoped that by providing them in a documented package for a popular language such asR, they will receive further uptake.

The methods described have been developed for keeping transparent control over Type I errors, but power calculators have been implemented in the package as well. Consideration of power can assist researchers in study design and in setting realistic expectations for what these types of statistical hypothesis tests can achieve (McIntosh and Rittmo 2020).

Berger, James O., and Dongchu Sun. 2008. “Objective Priors for the Bivariate Normal Model.” The Annals of Statistics 36 (2): 963–82. https://www.jstor.org/stable/25464652.
Buckingham, Gavin. 2014. “Getting a Grip on Heaviness Perception: A Review of Weight Illusions and Their Probable Causes.” Experimental Brain Research 232 (6): 1623–29. https://doi.org/10.1007/s00221-014-3926-9.
Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates.
Crawford, John, and Paul Garthwaite. 2002. “Investigation of the Single Case in Neuropsychology: Confidence Limits on the Abnormality of Test Scores and Test Score Differences.” Neuropsychologia 40 (8): 1196–1208. https://doi.org/10.1016/S0028-3932(01)00224-X.
———. 2005. “Testing for Suspected Impairments and Dissociations in Single-Case Studies in Neuropsychology: Evaluation of Alternatives Using Monte Carlo Simulations and Revised Tests for Dissociations.” Neuropsychology 19 (3): 318–31. https://doi.org/10.1037/0894-4105.19.3.318.
———. 2006. “Methods of Testing for a Deficit in Single-Case Studies: Evaluation of Statistical Power by Monte Carlo Simulation.” Cognitive Neuropsychology 23 (6): 877–904. https://doi.org/10.1080/02643290500538372.
———. 2007. “Comparison of a Single Case to a Control or Normative Sample in Neuropsychology: Development of a Bayesian Approach.” Cognitive Neuropsychology 24 (4): 343–72. https://doi.org/10.1080/02643290701290146.
———. 2012. “Single-Case Research in Neuropsychology: A Comparison of Five Forms of t-Test for Comparing a Case to Controls.” Cortex 48 (8): 1009–16. https://doi.org/10.1016/j.cortex.2011.06.021.
Crawford, John, Paul Garthwaite, and D Howell. 2009. “On Comparing a Single Case with a Control Sample: An Alternative Perspective.” Neuropsychologia 47 (13): 2690–95. https://doi.org/10.1016/j.neuropsychologia.2009.04.011.
Crawford, John, Paul Garthwaite, D Howell, and C Gray. 2004. “Inferential Methods for Comparing a Single Case with a Control Sample: Modified t-Tests Versus Mycroft Et Al.’s (2002) Modified Anova.” Cognitive Neuropsychology 21 (7): 750–55. https://doi.org/10.1080/02643290342000276.
Crawford, John, Paul Garthwaite, and S Porter. 2010. “Point and Interval Estimates of Effect Sizes for the Case-Controls Design in Neuropsychology: Rationale, Methods, Implementations, and Proposed Reporting Standards.” Cognitive Neuropsychology 27 (3): 245–60. https://doi.org/10.1080/02643294.2010.513967.
Crawford, John, Paul Garthwaite, and K Ryan. 2011. “Comparing a Single Case to a Control Sample: Testing for Neuropsychological Deficits and Dissociations in the Presence of Covariates.” Cortex 47 (10): 1166–78. https://doi.org/10.1016/j.cortex.2011.02.017.
Crawford, John, and D Howell. 1998. “Comparing an Individual’s Test Score Against Norms Derived from Small Samples.” The Clinical Neuropsychologist 12 (4): 482–86. https://doi.org/10.1076/clin.12.4.482.7241.
Crawford, John, D Howell, and Paul Garthwaite. 1998. “Payne and Jones Revisited: Estimating the Abnormality of Test Score Differences Using a Modified Paired Samples t Test.” Journal of Clinical and Experimental Neuropsychology 20 (6): 898–905. https://doi.org/10.1076/jcen.20.6.898.1112.
Dijkerman, H.Chris, Sandra Lê, Jean-François Démonet, and A.David Milner. 2004. “Visuomotor Performance in a Patient with Visual Agnosia Due to an Early Lesion.” Cognitive Brain Research 20 (1): 12–25. https://doi.org/10.1016/j.cogbrainres.2003.12.007.
Garthwaite, Paul, and John Crawford. 2004. “The Distribution of the Difference Between Two t-Variates.” Biometrika 91 (4): 987–94.
Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2013. Bayesian Data Analysis, Third Edition. Chapman & Hall/CRC Texts in Statistical Science. Taylor & Francis. https://books.google.se/books?id=ZXL6AQAAQBAJ.
Hassan, Eleanor K, Anna Sedda, Gavin Buckingham, and Robert D McIntosh. 2020. “The Size-Weight Illusion in Visual Form Agnosic Patient DF.” Neurocase, August, 1–8. https://doi.org/10.1080/13554794.2020.1800748.
McIntosh, Robert D., and Jonathan Ö. Rittmo. 2020. “Power Calculations in Single-Case Neuropsychology: A Practical Primer.” Cortex 135: 146–58. https://doi.org/10.1016/j.cortex.2020.11.005.
Payne, R. W., and H. G. Jones. 1957. “Statistics for the Investigation of Individual Cases.” Journal of Clinical Psychology 13 (2): 115–21.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Shallice, Tim. 1988. From Neuropsychology to Mental Structure. Cambridge: Cambridge University Press.
Sokal, R. R., and F. J. Rohlf. 1981. Biometry: The Principles and Practice of Statistics in Biological Research. W. H. Freeman. https://books.google.co.uk/books?id=C-OTQgAACAAJ.
Tiao, George C., and Arnold Zellner. 1964. “On the Bayesian Estimation of Multivariate Regression.” Journal of the Royal Statistical Society: Series B (Methodological) 26 (2): 277–85. https://doi.org/https://doi.org/10.1111/j.2517-6161.1964.tb00560.x.