Using smd

Bradley Saul

2020-10-13

The smd package provides the smd method to compute standardized mean differences between two groups for continuous values (numeric and integer data types) and categorical values (factor, character, and logical). The method also works on matrix, list, and data.frame data types by applying smd() over the columns of the matrix or data.frame and each item of the list. The package is based on Yang and Dalton (2012).

The smd function computes the standardized mean difference for each level \(k\) of a grouping variable compared to a reference \(r\) level:

\[ d_k = \sqrt{(\bar{x}_r - \bar{x}_{k})^{\intercal}S_{rk}^{-1}(\bar{x}_r - \bar{x}_{k})} \]

where \(\bar{x}_{\cdot}\) and \(S_{rk}\) are the sample mean and covariances for reference group \(r\) and group \(k\), respectively. In the case that \(x\) is categorical, \(\bar{x}\) is the vector of proportions of each category level within a group, and \(S_{rk}\) is the multinomial covariance matrix.

Standard errors are computed using the formula described in Hedges and Olkin (1985):

\[ \sqrt{ \frac{n_r + n_k}{n_rn_k} + \frac{d_k^2}{2(n_r + n_k)} } \]

Examples

library(smd)

Numeric

set.seed(123)
xn <- rnorm(90)
gg2 <- rep(LETTERS[1:2], each = 45)
gg3 <- rep(LETTERS[1:3], each = 30)

smd(x = xn, g = gg2)
#>   term   estimate
#> 1    B 0.03413269
smd(x = xn, g = gg3)
#>   term    estimate
#> 1    B -0.25169577
#> 2    C -0.07846864
smd(x = xn, g = gg2, std.error = TRUE)
#>   term   estimate std.error
#> 1    B 0.03413269 0.2108339
smd(x = xn, g = gg3, std.error = TRUE)
#>   term    estimate std.error
#> 1    B -0.25169577 0.2592192
#> 2    C -0.07846864 0.2582982

Integers

xi <- sample(1:20, 90, replace = TRUE)
smd(x = xi, g = gg2)
#>   term  estimate
#> 1    B 0.1687339

Character

xc <- unlist(replicate(2, sort(sample(letters[1:3], 45, replace = TRUE)), simplify = FALSE))
smd(x = xc, g = gg2)
#>   term  estimate
#> 1    B 0.1946887

Factors

xf <- factor(xc)
smd(x = xf, g = gg2)
#>   term  estimate
#> 1    B 0.1946887

Logical

xl <- as.logical(rbinom(90, 1, prob = 0.5))
smd(x = xl, g = gg2)
#>   term estimate
#> 1    B        0

Matrices

mm <- cbind(xl, xl, xl, xl)
smd(x = mm, g = gg3, std.error = FALSE)
#>               xl          xl          xl          xl
#> [1,] -0.06765101 -0.06765101 -0.06765101 -0.06765101
#> [2,] -0.20203051 -0.20203051 -0.20203051 -0.20203051

Lists

ll <- list(xn = xn, xi = xi, xf = xf, xl = xl)
smd(x = ll, g = gg3)
#>   variable term    estimate
#> 1       xn    B -0.25169577
#> 2       xn    C -0.07846864
#> 3       xi    B  0.30325301
#> 4       xi    C  0.36089675
#> 5       xf    B  1.50232594
#> 6       xf    C  2.23606798
#> 7       xl    B -0.06765101
#> 8       xl    C -0.20203051

data.frames

df <- data.frame(xn, xi, xc, xf, xl)
smd(x = df, g = gg3)
#>    variable term    estimate
#> 1        xn    B -0.25169577
#> 2        xn    C -0.07846864
#> 3        xi    B  0.30325301
#> 4        xi    C  0.36089675
#> 5        xc    B  1.50232594
#> 6        xc    C  2.23606798
#> 7        xf    B  1.50232594
#> 8        xf    C  2.23606798
#> 9        xl    B -0.06765101
#> 10       xl    C -0.20203051

Using smd with dplyr

library(dplyr, verbose = FALSE)
#> Warning: package 'dplyr' was built under R version 3.6.2
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df$g <- gg2
df %>%
  summarize_at(
    .vars = vars(dplyr::matches("^x")),
    .funs = list(smd = ~ smd(., g = g)$estimate))
#>       xn_smd    xi_smd    xc_smd    xf_smd xl_smd
#> 1 0.03413269 0.1687339 0.1946887 0.1946887      0

Other packages

See:

References

Hedges, LV, and I Olkin. 1985. Statistical Methods for Meta-Analysis.

Yang, Dongsheng, and Jarrod E Dalton. 2012. “A Unified Approach to Measuring the Effect Size Between Two Groups Using SAS” 335: 1–6. http://www.lerner.ccf.org/qhs/software/lib/stddiff.pdf.