Closed Testing Procedure

Paul Jordan

2021-04-27

The Closure Principle

The closure principle is a way to protect the type-I error from multiple testing. Here, we follow the description in (Bretz, Hothorn, T., and Westfall, P. 2011). It consists of four steps:

  1. Definition of a set \({H} = \{H_1,\ldots, H_n\}\) of elementary hypotheses.

  2. Construction of the closure set (“Hypothesis Tree”).

\[\overline{H} = \left \{ H_I =\bigcap_{i \in I}H_i : \quad I \subseteq \{1,\ldots,n\} \right \} \] \[(\text{all intersection hypotheses} H_I ).\]

  1. Construction of a local level-\(\alpha\) test for each \(H_I \in \overline{H}\).

  2. Rejection of \(H_i\), if all null hypotheses \(H_I \in \overline{H}\) with \(i \in I\) are rejected at at the local level \(\alpha\).

Adjusted p-values

As the null hypothesis \(H_i\) is rejected only if the null hypotheses \(H_I \in \overline{H}\) with \(i \in I\) are rejected (see point 4. above), the adjusted p-value \(p_{adj;i}\) for \(H_i\) is defined as:

  1. Denote with \(p_I\) the p-value for a given intersection hypothesis \(H_I, \quad I \subseteq \{1, \ldots,n\}\).
  2. Then, \(p_{adj;i}=\max_\limits{I:i\in I} p_I,\quad i=1,\ldots , n\).

Implementation

The package was designed in partcular for treatment comparisons in ANOVA-like situations.

Closure set

The hypothesis tree of the closed testing procedure is created using the function IntersectHypotheses.

Local tests for a given “hypothesis tree”

In the case of single hypotheses (i.e. if the hypothesis can be described by a single integer vector e.g. (1,3,5) the test (F-Test, Kruskal-Wallis-test, probability test, logrank test, ….) is applied directly.

For combined hypotheses (i.e. for hypotheses described by several non-overlapping integer vectors eg. (1,2), (3,4), The procedure differs for (generalised) linear hypotheses and other tests.

In the case of generalised linear hypotheses, the contrast matrices for the single hypotheses included are combined and these contrasts are tested simultaneously. Henceforth, functions from the package emmeans(Russell and Lenth 2020) are used, as for all othe linear and generalised linear hypotheses. For all other tests, first the p-values \(p_1, p_2, \ldots ,p_m\) for the single hypotheses are calculated, and then these are combined by Fisher’s combination rule:

If all \(m\) hypotheses are assumed to be independent, the test statistics \(X\) follows under \(H_0\) a \(\chi^2\)-distribution with \(2m\) degrees of freedom: \[ X=-2\sum_{i=1}^{m}\ln(p_i) \sim \chi_{2m}^2\] from which a p-value for the global hypothesis can be easily obtained.

In the case of trend tests, the same type of test is applied for all intermediate single tests.

Adjusted p-values

Finally the p-values for the elementary hypotheses are adjusted by calculating the maximum of the p-values from the hypotheses in the testing set of the respective hypothesis.

The function AnalyseCTP calculates all local p-values and the adjusted p-values for all elementary hypotheses.

With the function Adjust_raw, it is also possible to use p-values that have been calculated by other functions or software to calculate the adjusted p-values.

Testing set for a specific elementary hypothesis

The testing set for a specific elementary hyothesis can be printed by the function TestingSet.

Comparing means

The dataframe pasi comprises the changes in PASI-score (Psoriasis Area and Severity Index) from baseline within two month in 72 patients treated with three different doses of Etretin or Placebo in a double blind study.

The elementary hypotheses 1:2, 1:3, 1:4 are tested simultaneously using the F-Test i.e. \(H_1: \mu_1=\mu_2\), \(H_2: \mu_1=\mu_3\) and \(H_3: \mu_1=\mu_4\) simultaneously. The groups with levels 2,3 and 4 are compared to the control (Placebo) group (level 1). In this specific example, the adjusted and unadjusted p-values are the same. All doses show a significant effect compared to Placebo.

Another hypothesis structure

Testing the elementary hypotheses 1:2, 2:3, 3:4 simultaneously using the F-Test, i.e. testing \(H_1: \mu_1=\mu_2\), \(H_2: \mu_2=\mu_3\) and \(H_3: \mu_3=\mu_4\) simultaneously. This provides quite different results (compared to pasi.ctp.F1): No further improvement for higher doses.

Other tests

For the same hypothesis structure, other tests can also be used:

Proportions

The data set colorectal contains the response rates from a dose finding study in metastatic colorectal cancer. Two doses of the experimental drug were compared to the standard treatment. The response rates in the two dose groups are compared to the control responder rate using both, the \(\chi^2\)-test and Fisher’s exact test.

Survival Analysis with the logrank test

This example uses the sample dataset ovarian from the package survival. The overall survival curves of the two treatments rx do not differ significantly:

Together with the performance subgroups ecog=1 and ecog=2 , a factor “subgroups” defined by the combinations of the performance measure ecog.ps and the treatment rx.

Then, the treatment differences within the performance subgroups ecog=1 and ecog=2 are compared. I.e. the elementary hypotheses are subgroup11=subgroup12 and subgroup21=subgroup22 or \(\{(1,2),(3,4)\}\).

Comparing means when a covariate is included

In a study with diabetes type II patients (dataset glucose), three doses of a drug are compared to a placebo. The primary variable is the change of fasting plasma glucose from baseline. Fasting plasma glucose at baseline is included into the model as covariate (only implemented for linear and generalised linear models).

Large hypothesis trees

Whith an increasing number of hypotheses to test, the graphical display may become quite confusing:

G <- factor(rep(1:5,each=4) )           
y <- rnorm(20)
Y <- data.frame(G,y)
                
xxx <- IntersectHypotheses(list(1:2,c(1,3),c(1,4),c(1,5),c(2,5),c(3,4)))
summary(xxx)
## 
## Hypotheses to be tested 
## =======================
## 
##  hyp.no level hypothesis.name
##       1     1            [12]
##       2     1            [13]
##       3     1            [14]
##       4     1            [15]
##       5     1            [25]
##       6     1            [34]
##       1     2           [123]
##       2     2           [124]
##       3     2           [125]
##       4     2        [12][34]
##       5     2           [134]
##       6     2           [135]
##       7     2        [13][25]
##       8     2           [145]
##       9     2        [14][25]
##      10     2        [15][34]
##      11     2        [25][34]
##       1     3          [1234]
##       2     3          [1235]
##       3     3         [12345]
##       4     3          [1245]
##       5     3       [125][34]
##       6     3          [1345]
##       7     3       [134][25]
##       1     4         [12345]
## 
## Connection structure of the hypotheses 
## ======================================
## 
##  Level            Connection
##      1         [12] -> [123]
##      1         [12] -> [124]
##      1         [12] -> [125]
##      1      [12] -> [12][34]
##      1         [13] -> [123]
##      1         [13] -> [134]
##      1         [13] -> [135]
##      1      [13] -> [13][25]
##      1         [14] -> [124]
##      1         [14] -> [134]
##      1         [14] -> [145]
##      1      [14] -> [14][25]
##      1         [15] -> [125]
##      1         [15] -> [135]
##      1         [15] -> [145]
##      1      [15] -> [15][34]
##      1         [25] -> [125]
##      1      [25] -> [13][25]
##      1      [25] -> [14][25]
##      1      [25] -> [25][34]
##      1      [34] -> [12][34]
##      1         [34] -> [134]
##      1      [34] -> [15][34]
##      1      [34] -> [25][34]
##      2       [123] -> [1234]
##      2       [123] -> [1235]
##      2      [123] -> [12345]
##      2       [124] -> [1234]
##      2      [124] -> [12345]
##      2       [124] -> [1245]
##      2       [125] -> [1235]
##      2      [125] -> [12345]
##      2       [125] -> [1245]
##      2    [125] -> [125][34]
##      2    [12][34] -> [1234]
##      2   [12][34] -> [12345]
##      2 [12][34] -> [125][34]
##      2       [134] -> [1234]
##      2      [134] -> [12345]
##      2       [134] -> [1345]
##      2    [134] -> [134][25]
##      2       [135] -> [1235]
##      2      [135] -> [12345]
##      2       [135] -> [1345]
##      2    [13][25] -> [1235]
##      2   [13][25] -> [12345]
##      2 [13][25] -> [134][25]
##      2      [145] -> [12345]
##      2       [145] -> [1245]
##      2       [145] -> [1345]
##      2   [14][25] -> [12345]
##      2    [14][25] -> [1245]
##      2 [14][25] -> [134][25]
##      2   [15][34] -> [12345]
##      2 [15][34] -> [125][34]
##      2    [15][34] -> [1345]
##      2   [25][34] -> [12345]
##      2 [25][34] -> [125][34]
##      2 [25][34] -> [134][25]
##      3    [12345] -> [12345]
##      3     [1234] -> [12345]
##      3     [1235] -> [12345]
##      3     [1245] -> [12345]
##      3  [125][34] -> [12345]
##      3     [1345] -> [12345]
##      3  [134][25] -> [12345]
Display(xxx)

“External” p-values

It is possible to:

        summary(Pairwise)
## 
## Hypotheses to be tested 
## =======================
## 
##  hyp.no level hypothesis.name
##       1     1            [12]
##       2     1            [13]
##       3     1            [14]
##       4     1            [23]
##       5     1            [24]
##       6     1            [34]
##       1     2           [123]
##       2     2           [124]
##       3     2        [12][34]
##       4     2           [134]
##       5     2        [13][24]
##       6     2        [14][23]
##       7     2           [234]
##       1     3          [1234]
## 
## Connection structure of the hypotheses 
## ======================================
## 
##  Level         Connection
##      1      [12] -> [123]
##      1      [12] -> [124]
##      1   [12] -> [12][34]
##      1      [13] -> [123]
##      1      [13] -> [134]
##      1   [13] -> [13][24]
##      1      [14] -> [124]
##      1      [14] -> [134]
##      1   [14] -> [14][23]
##      1      [23] -> [123]
##      1   [23] -> [14][23]
##      1      [23] -> [234]
##      1      [24] -> [124]
##      1   [24] -> [13][24]
##      1      [24] -> [234]
##      1   [34] -> [12][34]
##      1      [34] -> [134]
##      1      [34] -> [234]
##      2    [123] -> [1234]
##      2    [124] -> [1234]
##      2 [12][34] -> [1234]
##      2    [134] -> [1234]
##      2 [13][24] -> [1234]
##      2 [14][23] -> [1234]
##      2    [234] -> [1234]
        
        # the vector of p-values calculated by another software
        # (Example from Prof. John M. Lachin, The Biostatistics Center Rockville MD)
        
        p.val <- c(
          0.4374,
          0.6485,
          0.4103,
          0.2203,
          0.1302,
          0.6725,
          0.4704,
          0.3173,
          0.6762,
          0.7112,
          0.2866,
          0.3362,
          0.2871,
          0.4633)

        result <- Adjust_raw(Pairwise, p.value=p.val)
        
        summary(result,digits=3)    
## 
## Summary of Closed Testing Procedure
## ===================================
## 
## Elementary Hypotheses and p-values
## ----------------------------------
## 
##  Hypothesis raw p-value adj. p-value
##        [12]       0.437        0.676
##        [13]       0.648        0.711
##        [14]       0.410        0.711
##        [23]       0.220        0.470
##        [24]       0.130        0.463
##        [34]       0.672        0.711
                
# details may be documented
        
        result <- Adjust_raw(Pairwise, p.value=p.val
                             ,dataset.name="my Data", factor.name="Factor"
                             ,factor.levels=c("A","B","C","D"), model=y~Factor
                             ,test.name="my Test")
        
        summary(result,digits=3)
## 
## Summary of Closed Testing Procedure
## ===================================
## 
## Model : y ~ Factor , test : my Test 
## 
## Factor levels: 1=A, 2=B, 3=C, 4=D 
## 
## Elementary Hypotheses and p-values
## ----------------------------------
## 
##  Hypothesis raw p-value adj. p-value
##        [12]       0.437        0.676
##        [13]       0.648        0.711
##        [14]       0.410        0.711
##        [23]       0.220        0.470
##        [24]       0.130        0.463
##        [34]       0.672        0.711

References

Bauer, Peter. 1991. “Multiple Testing in Clinical Trials.” Statistics in Medicine 10 (4): 261–63.

Bretz, Frank, Hothorn, T., and Westfall, P. 2011. Multiple Comparisons Using R. CRC Press.

Dmitrienko, Alex, Tamhane, A., and Bretz, F. 2010. Multiple Testing Problems in Pharmaceutical Statistics. Chapman & Hall.

Marcus, Ruth, Peritz, E., and Gabriel, K.R. 1976. “On Closed Testing Procedures with Special Reference to Ordered Analysis of Variance.” Biometrika 63: 655–60.

Russell, and Lenth. 2020. “Estimated Marginal Means, Aka Least-Squares Means.”