Statistical tests in gaze

Loading package

library(autoReg)
library(dplyr) # for use of pipe operator `%>%`

Statistical tests for numeric variables

The gaze() function in this autoReg package perform statistical tests for compare means between/among groups. The acs data included in moonBook package is a dataset containing demographic and laboratory data of 857 patients with acute coronary syndrome(ACS).

To make a table comparing baseline characteristics, use gaze() function.

data(acs, package="moonBook")
gaze(sex~.,data=acs)
————————————————————————————————————————————————————————————————————————
  Dependent:sex        levels           Female          Male        p   
       (N)                             (N=287)        (N=570)           
————————————————————————————————————————————————————————————————————————
age               Mean ± SD             68.7 ± 10.7   60.6 ± 11.2  <.001 
cardiogenicShock  No                    275 (95.8%)     530 (93%)   .136 
                  Yes                     12 (4.2%)       40 (7%)        
entry             Femoral               119 (41.5%)   193 (33.9%)   .035 
                  Radial                168 (58.5%)   377 (66.1%)        
Dx                NSTEMI                 50 (17.4%)   103 (18.1%)   .012 
                  STEMI                  84 (29.3%)   220 (38.6%)        
                  Unstable Angina       153 (53.3%)   247 (43.3%)        
EF                Mean ± SD             56.3 ± 10.1    55.6 ± 9.4   .387 
height            Mean ± SD             153.8 ± 6.2   167.9 ± 6.1  <.001 
weight            Mean ± SD              57.2 ± 9.3   68.7 ± 10.3  <.001 
BMI               Mean ± SD              24.2 ± 3.6    24.3 ± 3.2   .611 
obesity           No                    194 (67.6%)   373 (65.4%)   .580 
                  Yes                    93 (32.4%)   197 (34.6%)        
TC                Mean ± SD            188.9 ± 51.1  183.3 ± 45.9   .124 
LDLC              Mean ± SD            117.8 ± 41.2  116.0 ± 41.1   .561 
HDLC              Mean ± SD             39.0 ± 11.5   37.8 ± 10.9   .145 
TG                Mean ± SD            119.9 ± 76.2  127.9 ± 97.3   .195 
DM                No                    173 (60.3%)   380 (66.7%)   .077 
                  Yes                   114 (39.7%)   190 (33.3%)        
HBP               No                     83 (28.9%)   273 (47.9%)  <.001 
                  Yes                   204 (71.1%)   297 (52.1%)        
smoking           Ex-smoker              49 (17.1%)   155 (27.2%)  <.001 
                  Never                 209 (72.8%)   123 (21.6%)        
                  Smoker                 29 (10.1%)   292 (51.2%)        
————————————————————————————————————————————————————————————————————————

You can make a publication-ready table with myft() function which can be used in HTML, pdf, microsoft word and powerpoint file.

gaze(sex~.,data=acs) %>% myft()

You can select the statistical method comparing means between/among groups with argument method. Possible values in methods are:

Default value is 1.

1. Comparison of two groups

Ejection fraction(EF) refers to how well your left ventricle (or right ventricle) pumps blood with each heart beat. The normal values are approximately 56-78%.

(1) Parametric method

gaze(sex~EF,data=acs)  # default: method=1 
——————————————————————————————————————————————————————————————
 Dependent:sex   levels        Female          Male       p   
      (N)                     (N=287)        (N=570)          
——————————————————————————————————————————————————————————————
EF             Mean ± SD       56.3 ± 10.1    55.6 ± 9.4  .387 
——————————————————————————————————————————————————————————————

If you want to compare EF means between males and females in acs data with parametric method, you have to compare the variances of two samples. If the variances of two groups are equal, the pooled variance is used to estimate the variance. Otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.

var.test(EF~sex,data=acs)  # F Test to Compare Two Variances

    F test to compare two variances

data:  EF by sex
F = 1.144, num df = 239, denom df = 482, p-value = 0.2214
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.9221264 1.4309581
sample estimates:
ratio of variances 
          1.143983 

The result of var.test is not significant. So we cannot reject the null hypothesis :\(H_0 : true\ ratio\ of\ variance\ is\ equal\ to\ 0\). With this result, we perform t-test using pooled variance.

t.test(EF~sex,data=acs,var.equal=TRUE)

    Two Sample t-test

data:  EF by sex
t = 0.86514, df = 721, p-value = 0.3872
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
 -0.8346856  2.1498875
sample estimates:
mean in group Female   mean in group Male 
            56.27375             55.61615 

The result of t.test is not significant(\(p=.387\)). The p value in the table is the result of this test. Alternatively, if the result of var.test() is significant, we perform t.test with the Welch approximation to the degrees of freedom.

t.test(EF~sex,data=acs) # default value: var.equal=FALSE

    Welch Two Sample t-test

data:  EF by sex
t = 0.8458, df = 449.65, p-value = 0.3981
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
 -0.8703566  2.1855585
sample estimates:
mean in group Female   mean in group Male 
            56.27375             55.61615 

(2) Non-parametric method

gaze(sex~EF,data=acs, method=2)  # method=2 forces analysis as continuous non-normal 
—————————————————————————————————————————————————————————————————————————————
 Dependent:sex     levels           Female                Male           p   
      (N)                           (N=287)              (N=570)             
—————————————————————————————————————————————————————————————————————————————
EF             Median (IQR)    59.2 (51.4 to 63.1)  57.3 (50.0 to 61.8)  .053 
—————————————————————————————————————————————————————————————————————————————

When you choose method=2, the Wilcoxon rank sum test(also known as Mann-Whitney test) is performed.

wilcox.test(EF~sex,data=acs)

    Wilcoxon rank sum test with continuity correction

data:  EF by sex
W = 63078, p-value = 0.05295
alternative hypothesis: true location shift is not equal to 0

(3) Performs test for normality

gaze(sex~EF,data=acs, method=3) 
—————————————————————————————————————————————————————————————————————————————
 Dependent:sex     levels           Female                Male           p   
      (N)                           (N=287)              (N=570)             
—————————————————————————————————————————————————————————————————————————————
EF             Median (IQR)    59.2 (51.4 to 63.1)  57.3 (50.0 to 61.8)  .053 
—————————————————————————————————————————————————————————————————————————————

When method=3, perform the Shapiro-Wilk test or the Anderson-Daring test for normality(nortest::ad.test) to decide between normal or non-normal. If the number of cases are below 5000, Shapiro-Wilk test performed. If above 5000, Anderson-Daring test for normality performed.

nrow(acs)
[1] 857
out=lm(age~sex,data=acs)
shapiro.test(resid(out))

    Shapiro-Wilk normality test

data:  resid(out)
W = 0.99343, p-value = 0.000808

The result of shapiro.test() is significant. So we perform Wilcoxon rank sum test.

2. Comparison of three or more groups

The ‘Dx’ column of acs data is diagnosis. It has three groups : Unstable Angina, NSTEMI and STEMI. You can make a table summarizing baseline characteristics among three groups. The parametric method comparing means of three or more groups is ANOVA, whereas non-parametric method is Kruskal-Wallis rank sum test.

gaze(Dx~.,data=acs) %>% myft()

(1) Parametric method

Now we focus on comparing means of age among three groups.

gaze(Dx~age,data=acs)  # default : method=1
———————————————————————————————————————————————————————————————————————————————————————
 Dependent:Dx   levels        NSTEMI          STEMI          Unstable Angina       p   
     (N)                     (N=153)         (N=304)             (N=400)               
———————————————————————————————————————————————————————————————————————————————————————
age           Mean ± SD       64.3 ± 12.3    62.1 ± 12.1              63.8 ± 11.0  .073 
———————————————————————————————————————————————————————————————————————————————————————

We can perform ANOVA as follows

out=lm(age~Dx,data=acs)
anova(out)
Analysis of Variance Table

Response: age
           Df Sum Sq Mean Sq F value  Pr(>F)  
Dx          2    715  357.62   2.624 0.07309 .
Residuals 854 116389  136.29                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

On analysis of variance table you can get the p value 0.073.

(2) Non-parametric method

gaze(Dx~age,data=acs, method=2) %>% myft()

The above p value in the table is the result of Kruskal-Wallis rank sum test.

kruskal.test(age~Dx,data=acs)

    Kruskal-Wallis rank sum test

data:  age by Dx
Kruskal-Wallis chi-squared = 4.424, df = 2, p-value = 0.1095
      if(sum(result)<=5000) out4=shapiro.test(resid(out3))
      else out4=nortest::ad.test(resid(out3))
      out5=kruskal.test(as.numeric(x),factor(y))
      p=c(out4$p.value,anova(out3)$Pr[1],out5$p.value)

(3) Performs test for normality

gaze(Dx~age,data=acs, method=3) %>% myft()

When method=3, gaze() performs normality test.

out=lm(age~Dx,data=acs)
shapiro.test(resid(out))

    Shapiro-Wilk normality test

data:  resid(out)
W = 0.99102, p-value = 4.413e-05

Since the result for normality test is significant(\(p<0.001\)), then we perform Kruskal-Wallis test.

Statistical tests for categorical variables

The statistical methods for categorical variables in gaze() are as follows:

You can choose by setting catMethod argument(default value is 2).

(1) Default method : chi-squared test with continuity correction

The default method for categorical variables is chi-squared test with Yates’s correction for continuity(https://en.wikipedia.org/wiki/Yates%27s_correction_for_continuity).

gaze(sex~Dx,data=acs) # default : catMethod=2
————————————————————————————————————————————————————————————————————
 Dependent:sex      levels           Female          Male       p   
      (N)                           (N=287)        (N=570)          
————————————————————————————————————————————————————————————————————
Dx             NSTEMI                 50 (17.4%)   103 (18.1%)  .012 
               STEMI                  84 (29.3%)   220 (38.6%)       
               Unstable Angina       153 (53.3%)   247 (43.3%)       
————————————————————————————————————————————————————————————————————

You can get same result with the following R code:

result=table(acs$Dx,acs$sex)
chisq.test(result)  # default: correct = TRUE

    Pearson's Chi-squared test

data:  result
X-squared = 8.7983, df = 2, p-value = 0.01229

(2) Chi-squared test without continuity correction

If you want to perform chi-squared test without continuity correction, just set catMethod=1. This is the default method in SPSS.

gaze(sex~Dx,data=acs, catMethod=1) # Perform chisq.test without continuity correction
————————————————————————————————————————————————————————————————————
 Dependent:sex      levels           Female          Male       p   
      (N)                           (N=287)        (N=570)          
————————————————————————————————————————————————————————————————————
Dx             NSTEMI                 50 (17.4%)   103 (18.1%)  .012 
               STEMI                  84 (29.3%)   220 (38.6%)       
               Unstable Angina       153 (53.3%)   247 (43.3%)       
————————————————————————————————————————————————————————————————————

You can get same result with the following R code:

result=table(acs$Dx,acs$sex)
chisq.test(result, correct=FALSE)  # without continuity correction

    Pearson's Chi-squared test

data:  result
X-squared = 8.7983, df = 2, p-value = 0.01229

(3) Fisher’s exact test

If you want to perform Fisher’s exact test, set the catMethod=3.

gaze(sex~Dx,data=acs, catMethod=3) # Perform Fisher's exact test
————————————————————————————————————————————————————————————————————
 Dependent:sex      levels           Female          Male       p   
      (N)                           (N=287)        (N=570)          
————————————————————————————————————————————————————————————————————
Dx             NSTEMI                 50 (17.4%)   103 (18.1%)  .012 
               STEMI                  84 (29.3%)   220 (38.6%)       
               Unstable Angina       153 (53.3%)   247 (43.3%)       
————————————————————————————————————————————————————————————————————

You can get same result with the following R code:

result=table(acs$Dx,acs$sex)
fisher.test(result)  

    Fisher's Exact Test for Count Data

data:  result
p-value = 0.01191
alternative hypothesis: two.sided

(4) Test for trend in proportions

If you want to perform test for trend in proportions, set the catMethod=4. You can perform this test only when the grouping variable has only two group(male and female for example).

gaze(sex~Dx,data=acs, catMethod=4) # Perform test for trend in proportions
————————————————————————————————————————————————————————————————————
 Dependent:sex      levels           Female          Male       p   
      (N)                           (N=287)        (N=570)          
————————————————————————————————————————————————————————————————————
Dx             NSTEMI                 50 (17.4%)   103 (18.1%)  .050 
               STEMI                  84 (29.3%)   220 (38.6%)       
               Unstable Angina       153 (53.3%)   247 (43.3%)       
————————————————————————————————————————————————————————————————————

You can get same result with the following R code:

result=table(acs$Dx,acs$sex)
result
                 
                  Female Male
  NSTEMI              50  103
  STEMI               84  220
  Unstable Angina    153  247
prop.trend.test(result[,2],rowSums(result)) 

    Chi-squared Test for Trend in Proportions

data:  result[, 2] out of rowSums(result) ,
 using scores: 1 2 3
X-squared = 3.8332, df = 1, p-value = 0.05025

Make a combining table with two or more grouping variables

You can make a combining table with two or more grouping variables.

gaze(sex+Dx~.,data=acs) %>% myft()

You can select whether or not show total column.

gaze(sex+Dx~.,data=acs,show.total=TRUE) %>% myft()

Missing data analysis

You can use gaze() for missing data analysis. Set the missing argument TRUE.

gaze(EF~.,data=acs, missing=TRUE) %>% myft()

If there is no missing data, show the table summarizing missing numbers.

gaze(sex~.,data=acs,missing=TRUE) %>% myft()
There is no missing data in column 'sex'