An Introduction to t_TOST

A new function for TOST with t-tests

Aaron R. Caldwell

2022-03-22

In an effort to make TOSTER more informative and easier to use, a new function t_TOST has been created. This function operates very similarly to base R’s t.test function, but performs 3 t-tests (one two-tailed and two one-tailed tests). In addition, this function has a generic method where two vectors can be supplied or a formula can be given (e.g.,y ~ group). This function also makes it easier to switch between types of t-tests. All three types (two sample, one sample, and paired samples) can be performed/calculated from the same function. Moreover, the summary information and visualizations have been upgraded. This should make the decisions derived from the function more informative and user-friendly.

Also, t_TOST is not limited to equivalence tests. Minimal effects testing (MET) is now possible. MET is useful for situations where the hypothesis is about a minimal effect and the null hypothesis is equivalence.

In the general introduction to this package we detailed how to look at old results and how to apply TOST to interpreting those results. However, in many cases, users may have new data that needs to be analyzed. Therefore, t_TOST can be applied to new data. This vignette will use the iris and the sleep data.

data('sleep')
data('iris')

Independent Groups

For this example, we will use the sleep data. In this data there is a group variable and an outcome extra.

head(sleep)
#>   extra group ID
#> 1   0.7     1  1
#> 2  -1.6     1  2
#> 3  -0.2     1  3
#> 4  -1.2     1  4
#> 5  -0.1     1  5
#> 6   3.4     1  6

We will assume the data are independent, and that we have equivalence bounds of +/- 0.5. All we need to do is provide the formula, data, and eqbound arguments for the function to run appropriately. In addition, we can set the var.equal argument (to assume equal variance), and the paired argument (sets if the data is paired or not). Both are logical indicators that can be set to TRUE or FALSE. The alpha is automatically set to 0.05 but this can also be adjusted by the user. The Hedges correction is also automatically calculated, but this can be overridden with the bias_correction argument. The hypothesis is automatically set to “EQU” for equivalence but if a minimal effect is of interest then “MET” can be supplied

res1 = t_TOST(formula = extra ~ group,
              data = sleep,
              low_eqbound = -.5,
              high_eqbound = .5)

res1a = t_TOST(x = subset(sleep,group==1)$extra,
               y = subset(sleep,group==2)$extra,
               low_eqbound = -.5,
               high_eqbound = .5)

Once the function has run, we can print the results with the print command. This provides a verbose summary of the results.

print(res1)
#> 
#> Welch Two Sample t-test
#> Hypothesis Tested: Equivalence
#> Equivalence Bounds (raw):-0.500 & 0.500
#> Alpha Level:0.05
#> The equivalence test was non-significant, t(17.78) = -1.272, p = 8.9e-01
#> The null hypothesis test was non-significant, t(17.78) = -1.861, p = 7.94e-02
#> NHST: don't reject null significance hypothesis that the effect is equal to zero 
#>  TOST: don't reject null equivalence hypothesis
#> 
#> TOST Results 
#>                    t       SE       df    p.value
#> t-test     -1.860813 0.849091 17.77647 0.07939414
#> TOST Lower -1.271948 0.849091 17.77647 0.89010996
#> TOST Upper -2.449678 0.849091 17.77647 0.01245133
#> 
#> Effect Sizes 
#>                 estimate       SE  lower.ci    upper.ci conf.level
#> Raw           -1.5800000 0.849091 -3.053381 -0.10661850        0.9
#> Hedges' g(av) -0.7964846 0.497633 -1.684326 -0.06154947        0.9
#> 
#> Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").

Another nice feature is the generic plot method that can provide a visual summary of the results. All of the plots in this package were inspired by the concurve R package. There are two types of plots that can be produced. The first, and default, is the consonance density plot (type = "cd").

plot(res1, type = "cd")

The shading pattern can be modified with the ci_shades.

plot(res1, type = "cd",
     ci_shades = c(.9,.95))

Consonance plots, where all confidence intervals can be simultaneous plotted, can also be produced. The advantage here is multiple confidence interval lines can plotted at once.

plot(res1, type = "c",
     ci_lines =  c(.9,.95))

Paired Samples

To perform a paired samples TOST, the process does not change much. We could process the test the same way by providing a formula. All we would need to then is change paired to TRUE.

res2 = t_TOST(formula = extra ~ group,
              data = sleep,
              paired = TRUE,
              low_eqbound = -.5,
              high_eqbound = .5)
res2
#> 
#> Paired t-test
#> Hypothesis Tested: Equivalence
#> Equivalence Bounds (raw):-0.500 & 0.500
#> Alpha Level:0.05
#> The equivalence test was non-significant, t(9) = -2.777, p = 9.89e-01
#> The null hypothesis test was significant, t(9) = -4.062, p = 2.83e-03
#> NHST: reject null significance hypothesis that the effect is equal to zero 
#>  TOST: don't reject null equivalence hypothesis
#> 
#> TOST Results 
#>                    t        SE df      p.value
#> t-test     -4.062128 0.3889587  9 0.0028328902
#> TOST Lower -2.776644 0.3889587  9 0.9892407566
#> TOST Upper -5.347611 0.3889587  9 0.0002319027
#> 
#> Effect Sizes 
#>               estimate        SE  lower.ci   upper.ci conf.level
#> Raw          -1.580000 0.3889587 -2.293005 -0.8669947        0.9
#> Hedges' g(z) -1.230152 0.2008070 -1.848296 -0.8362302        0.9
#> 
#> Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").

However, we may have two vectors of data that are paired. So instead we may want to just provide those separately rather than using a data set and setting the formula. This can be demonstrated with the “iris” data.

res3 = t_TOST(x = iris$Sepal.Length,
              y = iris$Sepal.Width,
              paired = TRUE,
              low_eqbound = -1,
              high_eqbound = 1)
res3
#> 
#> Paired t-test
#> Hypothesis Tested: Equivalence
#> Equivalence Bounds (raw):-1.000 & 1.000
#> Alpha Level:0.05
#> The equivalence test was non-significant, t(149) = 22.319, p = 1e+00
#> The null hypothesis test was significant, t(149) = 34.815, p = 1.85e-73
#> NHST: reject null significance hypothesis that the effect is equal to zero 
#>  TOST: don't reject null equivalence hypothesis
#> 
#> TOST Results 
#>                   t         SE  df      p.value
#> t-test     34.81519 0.08002254 149 1.849554e-73
#> TOST Lower 47.31167 0.08002254 149 5.948643e-92
#> TOST Upper 22.31871 0.08002254 149 1.000000e+00
#> 
#> Effect Sizes 
#>              estimate         SE lower.ci upper.ci conf.level
#> Raw          2.786000 0.08002254 2.653551 2.918449        0.9
#> Hedges' g(z) 2.835487 0.25311166 2.571944 3.128367        0.9
#> 
#> Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").

We may want to perform a Minimal Effect Test with the hypothesis argument set to “MET.”

res3a = t_TOST(x = iris$Sepal.Length,
              y = iris$Sepal.Width,
               paired = TRUE,
               hypothesis = "MET",
               low_eqbound = -1,
               high_eqbound = 1)
res3a
#> 
#> Paired t-test
#> Hypothesis Tested: Minimal Effect
#> Equivalence Bounds (raw):-1.000 & 1.000
#> Alpha Level:0.05
#> The minimal effect test was significant, t(149) = 47.312, p = 1.13e-49
#> The null hypothesis test was significant, t(149) = 34.815, p = 1.85e-73
#> NHST: reject null significance hypothesis that the effect is equal to zero 
#>  TOST: reject null MET hypothesis
#> 
#> TOST Results 
#>                   t         SE  df      p.value
#> t-test     34.81519 0.08002254 149 1.849554e-73
#> TOST Lower 47.31167 0.08002254 149 1.000000e+00
#> TOST Upper 22.31871 0.08002254 149 1.130623e-49
#> 
#> Effect Sizes 
#>              estimate         SE lower.ci upper.ci conf.level
#> Raw          2.786000 0.08002254 2.653551 2.918449        0.9
#> Hedges' g(z) 2.835487 0.25311166 2.571944 3.128367        0.9
#> 
#> Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").

Again, we can plot the effects from the t_TOST result.

plot(res3a)

One Sample t-test

In other cases we may just have a one sample test. If that is the case all we have to do is supply the x argument for the data. For this test we may hypothesis that the mean of Sepal.Length is not more than 5.5 points greater or less than 8.5.

res4 = t_TOST(x = iris$Sepal.Length,
              hypothesis = "EQU",
              low_eqbound = 5.5,
              high_eqbound = 8.5)
res4
#> 
#> One Sample t-test
#> Hypothesis Tested: Equivalence
#> Equivalence Bounds (raw):5.500 & 8.500
#> Alpha Level:0.05
#> The equivalence test was significant, t(149) = 5.078, p = 5.62e-07
#> The null hypothesis test was significant, t(149) = 86.425, p = 3.33e-129
#> NHST: reject null significance hypothesis that the effect is equal to zero 
#> TOST: reject null equivalence hypothesis
#> 
#> TOST Results 
#>                     t         SE  df       p.value
#> t-test      86.425375 0.06761132 149 3.331256e-129
#> TOST Lower   5.078045 0.06761132 149  5.615560e-07
#> TOST Upper -39.293225 0.06761132 149  7.971896e-81
#> 
#> Effect Sizes 
#>           estimate         SE lower.ci upper.ci conf.level
#> Raw       5.843333 0.06761132 5.731427 5.955240        0.9
#> Hedges' g 7.021013 0.42002065 6.406708 7.788227        0.9
#> 
#> Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").
plot(res4)

Only have the summary statistics? No problem!

In some cases you may only have access to the summary statistics. Therefore, we created a function, tsum_TOST,to perform the same tests just based on the summary statistics. This involves providing the function with a number of different arguments.

The results from above can be replicated with the tsum_TOST

res_tsum = tsum_TOST(
  m1 = mean(iris$Sepal.Length, na.rm=TRUE),
  sd1 = sd(iris$Sepal.Length, na.rm=TRUE),
  n1 = length(na.omit(iris$Sepal.Length)),
  hypothesis = "EQU",
  low_eqbound = 5.5,
  high_eqbound = 8.5
)

res_tsum
#> 
#> One-sample t-Test
#> Hypothesis Tested: Equivalence
#> Equivalence Bounds (raw):5.500 & 8.500
#> Alpha Level:0.05
#> The equivalence test was significant, t(149) = 5.078, p = 5.62e-07
#> The null hypothesis test was significant, t(149) = 86.425, p = 3.33e-129
#> NHST: reject null significance hypothesis that the effect is equal to zero 
#> TOST: reject null equivalence hypothesis
#> 
#> TOST Results 
#>                     t         SE  df       p.value
#> t-test      86.425375 0.06761132 149 3.331256e-129
#> TOST Lower   5.078045 0.06761132 149  5.615560e-07
#> TOST Upper -39.293225 0.06761132 149  7.971896e-81
#> 
#> Effect Sizes 
#>           estimate         SE lower.ci upper.ci conf.level
#> Raw       5.843333 0.06761132 5.731427 5.955240        0.9
#> Hedges' g 7.021013 0.42002065 6.406708 7.788227        0.9
#> 
#> Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").
plot(res_tsum)

Power Analysis for t-test based TOST

A new function, power_t_TOST has been created to allow for power calculations for TOST analyses that utilize t-tests. This function uses a more accurate method than the older functions in this package and match the results of the PASS statistics software. The exact calculations of power are based on Owen’s Q-function or by direct integration of the bivariate non-central t-distribution (inspired by Labes, Schütz, and Lang (2021) in the PowerTOST R package). Approximate power is implemented via the non-central t-distribution or the ‘shifted’ central t-distribution Diletti, Hauschke, and Steinijans (1992). The function is limited to power analyses involves one sample, two sample, and paired sample cases. More options are available in the PowerTOST R package.

The interface for this function is quite simple and was intended to mimic the base R function power.t.test. The user must specify the 2 equivalence bounds, and leave only one of the other options blank (alpha, power, or n). The “true difference” can be set with delta and the standard deviation (default is 1) can be set with the sd argument. Once everything is set and the function is run, the function will return a object of the power.htest class.

As an example, let’s say we are looking at an equivalence study where we assume the true difference is 1 unit, the standard deviation is 2.5, and we set the equivalence bounds to 2.5 units as well. If we want to find the sample size adequate to have 95% power at an alpha of 0.025 we enter the following:

power_t_TOST(n = NULL,
  delta = 1,
  sd = 2.5,
  low_eqbound = -2.5,
  high_eqbound = 2.5,
  alpha = .025,
  power = .95,
  type = "two.sample")
#> 
#>      Two-sample TOST power calculation 
#> 
#>           power = 0.95
#>            beta = 0.05
#>           alpha = 0.025
#>               n = 73.16747
#>           delta = 1
#>              sd = 2.5
#>          bounds = -2.5, 2.5
#> 
#> NOTE: n is number in *each* group

From the analysis above we would conclude that adequate power is achieved with 74 participants per group and 148 participants in total.

References

Diletti, E, D Hauschke, and VW Steinijans. 1992. “Sample Size Determination for Bioequivalence Assessment by Means of Confidence Intervals.” International Journal of Clinical Pharmacology, Therapy, and Toxicology 30 Suppl 1: S51—8. http://europepmc.org/abstract/MED/1601532.
Labes, Detlew, Helmut Schütz, and Benjamin Lang. 2021. PowerTOST: Power and Sample Size for (Bio)equivalence Studies. https://CRAN.R-project.org/package=PowerTOST.
Phillips, Kem F. 1990. “Power of the Two One-Sided Tests Procedure in Bioequivalence.” Journal of Pharmacokinetics and Biopharmaceutics 18 (2): 137–44. https://doi.org/10.1007/bf01063556.