Using WeightIt to Estimate Balancing Weights

Noah Greifer

2022-06-24

Introduction

WeightIt contains several functions for estimating and assessing balancing weights for observational studies. These weights can be used to estimate the causal parameters of marginal structural models. I will not go into the basics of causal inference methods here. For good introductory articles, see Austin (2011), Austin and Stuart (2015), Robins, Hernán, and Brumback (2000), or Thoemmes and Ong (2016).

Typically, the analysis of an observation study might proceed as follows: identify the covariates for which balance is required; assess the quality of the data available, including missingness and measurement error; estimate weights that balance the covariates adequately; and estimate a treatment effect and corresponding standard error or confidence interval. This guide will go through all these steps for two observational studies: estimating the causal effect of a point treatment on an outcome, and estimating the causal parameters of a marginal structural model with multiple treatment periods. This is not meant to be a definitive guide, but rather an introduction to the relevant issues.

Estimating the Effect of a Point Treatment

First we will use the Lalonde dataset to estimate the effect of a point treatment. We’ll use the version of the data set that resides within the cobalt package, which we will use later on as well. Here, we are interested in the average treatment effect on the treated (ATT).

data("lalonde", package = "cobalt")
head(lalonde)
treat age educ race married nodegree re74 re75 re78
1 37 11 black 1 1 0 0 9930.0460
1 22 9 hispan 0 1 0 0 3595.8940
1 30 12 black 0 0 0 0 24909.4500
1 27 11 black 0 1 0 0 7506.1460
1 33 8 black 0 1 0 0 289.7899
1 22 9 black 0 1 0 0 4056.4940

We have our outcome (re78), our treatment (treat), and the covariates for which balance is desired (age, educ, race, married, nodegree, re74, and re75). Using cobalt, we can examine the initial imbalance on the covariates:

library("cobalt")
##  cobalt (Version 4.3.2, Build Date: 2022-01-19)
bal.tab(treat ~ age + educ + race + married + nodegree + re74 + re75,
        data = lalonde, estimand = "ATT", thresholds = c(m = .05))
## Balance Measures
##                Type Diff.Un      M.Threshold.Un
## age         Contin. -0.3094 Not Balanced, >0.05
## educ        Contin.  0.0550 Not Balanced, >0.05
## race_black   Binary  0.6404 Not Balanced, >0.05
## race_hispan  Binary -0.0827 Not Balanced, >0.05
## race_white   Binary -0.5577 Not Balanced, >0.05
## married      Binary -0.3236 Not Balanced, >0.05
## nodegree     Binary  0.1114 Not Balanced, >0.05
## re74        Contin. -0.7211 Not Balanced, >0.05
## re75        Contin. -0.2903 Not Balanced, >0.05
## 
## Balance tally for mean differences
##                     count
## Balanced, <0.05         0
## Not Balanced, >0.05     9
## 
## Variable with the greatest mean difference
##  Variable Diff.Un      M.Threshold.Un
##      re74 -0.7211 Not Balanced, >0.05
## 
## Sample sizes
##     Control Treated
## All     429     185

Based on this output, we can see that all variables are imbalanced in the sense that the standardized mean differences (for continuous variables) and differences in proportion (for binary variables) are greater than .05 for all variables. In particular, re74 and re75 are quite imbalanced, which is troubling given that they are likely strong predictors of the outcome. We will estimate weights using weightit() to try to attain balance on these covariates.

First, we’ll start simple, and use inverse probability weights from propensity scores generated through logistic regression. We need to supply weightit() with the formula for the model, the data set, the estimand (ATT), and the method of estimation ("ps") for propensity score weights).

library("WeightIt")
W.out <- weightit(treat ~ age + educ + race + married + nodegree + re74 + re75,
                  data = lalonde, estimand = "ATT", method = "ps")
W.out #print the output
## A weightit object
##  - method: "ps" (propensity score weighting)
##  - number of obs.: 614
##  - sampling weights: none
##  - treatment: 2-category
##  - estimand: ATT (focal: 1)
##  - covariates: age, educ, race, married, nodegree, re74, re75

Printing the output of weightit() displays a summary of how the weights were estimated. Let’s examine the quality of the weights using summary(). Weights with low variability are desirable because they improve the precision of the estimator. This variability is presented in several ways: by the ratio of the largest weight to the smallest in each group, the coefficient of variation (standard deviation divided by the mean) of the weights in each group, and the effective sample size computed from the weights. We want a small ratio, a smaller coefficient of variation, and a large effective sample size (ESS). What constitutes these values is mostly relative, though, and must be balanced with other constraints, including covariate balance. These metrics are best used when comparing weighting methods, but the ESS can give a sense of how much information remains in the weighted sample on a familiar scale.

summary(W.out)
##                  Summary of weights
## 
## - Weight ranges:
## 
##            Min                                  Max
## treated 1.0000         ||                    1.0000
## control 0.0092 |---------------------------| 3.7432
## 
## - Units with 5 most extreme weights by group:
##                                            
##               5      4      3      2      1
##  treated      1      1      1      1      1
##             597    573    381    411    303
##  control 3.0301 3.0592 3.2397 3.5231 3.7432
## 
## - Weight statistics:
## 
##         Coef of Var   MAD Entropy # Zeros
## treated       0.000 0.000  -0.000       0
## control       1.818 1.289   1.098       0
## 
## - Effective Sample Sizes:
## 
##            Control Treated
## Unweighted  429.       185
## Weighted     99.82     185

These weights have quite high variability, and yield an ESS of close to 100 in the control group. Let’s see if these weights managed to yield balance on our covariates.

bal.tab(W.out, stats = c("m", "v"), thresholds = c(m = .05))
## Call
##  weightit(formula = treat ~ age + educ + race + married + nodegree + 
##     re74 + re75, data = lalonde, method = "ps", estimand = "ATT")
## 
## Balance Measures
##                 Type Diff.Adj         M.Threshold V.Ratio.Adj
## prop.score  Distance  -0.0205     Balanced, <0.05      1.0324
## age          Contin.   0.1188 Not Balanced, >0.05      0.4578
## educ         Contin.  -0.0284     Balanced, <0.05      0.6636
## race_black    Binary  -0.0022     Balanced, <0.05           .
## race_hispan   Binary   0.0002     Balanced, <0.05           .
## race_white    Binary   0.0021     Balanced, <0.05           .
## married       Binary   0.0186     Balanced, <0.05           .
## nodegree      Binary   0.0184     Balanced, <0.05           .
## re74         Contin.  -0.0021     Balanced, <0.05      1.3206
## re75         Contin.   0.0110     Balanced, <0.05      1.3938
## 
## Balance tally for mean differences
##                     count
## Balanced, <0.05         9
## Not Balanced, >0.05     1
## 
## Variable with the greatest mean difference
##  Variable Diff.Adj         M.Threshold
##       age   0.1188 Not Balanced, >0.05
## 
## Effective sample sizes
##            Control Treated
## Unadjusted  429.       185
## Adjusted     99.82     185

For nearly all the covariates, these weights yielded very good balance. Only age remained imbalanced, with a standardized mean difference greater than .05 and a variance ratio greater than 2. Let’s see if we can do better. We’ll choose a different method: entropy balancing (Hainmueller 2012), which guarantees perfect balance on specified moments of the covariates while minimizing the entropy (a measure of dispersion) of the weights.

W.out <- weightit(treat ~ age + educ + race + married + nodegree + re74 + re75,
                  data = lalonde, estimand = "ATT", method = "ebal")
summary(W.out)
##                  Summary of weights
## 
## - Weight ranges:
## 
##            Min                                  Max
## treated 1.0000    ||                         1.0000
## control 0.0188 |---------------------------| 9.4195
## 
## - Units with 5 most extreme weights by group:
##                                            
##               5      4      3      2      1
##  treated      1      1      1      1      1
##             608    381    597    303    411
##  control 7.1268 7.5013 7.9979 9.0355 9.4195
## 
## - Weight statistics:
## 
##         Coef of Var   MAD Entropy # Zeros
## treated       0.000 0.000   0.000       0
## control       1.834 1.287   1.101       0
## 
## - Effective Sample Sizes:
## 
##            Control Treated
## Unweighted  429.       185
## Weighted     98.46     185

The variability of the weights has not changed much, but let’s see if there are any gains in terms of balance:

bal.tab(W.out, stats = c("m", "v"), thresholds = c(m = .05))
## Call
##  weightit(formula = treat ~ age + educ + race + married + nodegree + 
##     re74 + re75, data = lalonde, method = "ebal", estimand = "ATT")
## 
## Balance Measures
##                Type Diff.Adj     M.Threshold V.Ratio.Adj
## age         Contin.        0 Balanced, <0.05      0.4097
## educ        Contin.        0 Balanced, <0.05      0.6636
## race_black   Binary        0 Balanced, <0.05           .
## race_hispan  Binary       -0 Balanced, <0.05           .
## race_white   Binary       -0 Balanced, <0.05           .
## married      Binary       -0 Balanced, <0.05           .
## nodegree     Binary       -0 Balanced, <0.05           .
## re74        Contin.       -0 Balanced, <0.05      1.3264
## re75        Contin.       -0 Balanced, <0.05      1.3350
## 
## Balance tally for mean differences
##                     count
## Balanced, <0.05         9
## Not Balanced, >0.05     0
## 
## Variable with the greatest mean difference
##  Variable Diff.Adj     M.Threshold
##   married       -0 Balanced, <0.05
## 
## Effective sample sizes
##            Control Treated
## Unadjusted  429.       185
## Adjusted     98.46     185

Indeed, we have achieved perfect balance on the means of the covariates. However, the variance ratio of age is still quite high. We could continue to try to adjust for this imbalance, but if there is reason to believe it is unlikely to affect the outcome, it may be best to leave it as is. (You can try adding I(age^2) to the formula and see what changes this causes.)

Now that we have our weights stored in W.out, let’s extract them and estimate our treatment effect.

library(survey)
d.w <- svydesign(~1, weights = W.out$weights, data = lalonde)
fit <- svyglm(re78 ~ treat, design = d.w)
coef(fit)
## (Intercept)       treat 
##    5075.929    1273.215

Now let’s do some inference. Although some authors recommend using “robust” sandwich standard errors to adjust for the weights (Robins, Hernán, and Brumback 2000; Hainmueller 2012), others believe these can misleading and recommend bootstrapping instead (Reifeis and Hudgens 2020; Chan, Yam, and Zhang 2016). We’ll examine both approaches.

svyglm() in the survey package produces robust standard errors, so we can use summary() to view the standard error of the effect estimate.

#Robust standard errors and confidence intervals
summary(fit)
## 
## Call:
## svyglm(formula = re78 ~ treat, design = d.w)
## 
## Survey design:
## svydesign(~1, weights = W.out$weights, data = lalonde)
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5075.9      589.4   8.612   <2e-16 ***
## treat         1273.2      825.1   1.543    0.123    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 45038738)
## 
## Number of Fisher Scoring iterations: 2
confint(fit)
##                2.5 %   97.5 %
## (Intercept) 3918.409 6233.449
## treat       -347.069 2893.498

Our confidence interval for treat contains 0, so there isn’t evidence that treat has an effect on re78.

Next let’s use bootstrapping to estimate confidence intervals. We don’t need to use svyglm() and can simply use glm() (or lm()) to compute the effect estimates in each bootstrapped sample because we are not computing standard errors, and the treatment effect estimates will be the same.

#Bootstrapping
library("boot")
est.fun <- function(data, index) {
  W.out <- weightit(treat ~ age + educ + race + married + nodegree + re74 + re75,
                    data = data[index,], estimand = "ATT", method = "ebal")
  fit <- glm(re78 ~ treat, data = data[index,], weights = W.out$weights)
  return(coef(fit)["treat"])
}
boot.out <- boot(est.fun, data = lalonde, R = 999)
boot.ci(boot.out, type = "bca") #type shouldn't matter so much
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot.out, type = "bca")
## 
## Intervals : 
## Level       BCa          
## 95%   (-421, 2886 )  
## Calculations and Intervals on Original Scale

In this case, our confidence intervals were similar. Bootstrapping can take some time, especially with weight estimation methods that take longer, such as SuperLearner (method = "super"), covariate balancing propensity score estimation (method = "cbps"), or generalized boosted modeling (method = "gbm").

If we wanted to produce a “doubly-robust” treatment effect estimate, we could add baseline covariates to the glm() (or svyglm()) model (in both the original effect estimation and the confidence interval estimation).

Estimating the Effect of a Longitudinal Treatment

WeightIt can estimate weights for longitudinal treatment marginal structural models as well. This time, we’ll use the sample data set from twang to estimate our weights. Data must be in “wide” format; to go from long to wide, see the example at ?weightitMSM.

data("iptwExWide", package = "twang")
head(iptwExWide)
outcome gender age use0 use1 use2 tx1 tx2 tx3
-0.2782802 0 43 1.1349651 0.4674825 0.3174825 1 1 1
0.5319329 0 50 1.1119318 0.4559659 0.4059659 1 0 1
-0.8173614 1 36 -0.8707776 -0.5353888 -0.5853888 1 0 0
-0.1530853 1 63 0.2107316 0.0053658 -0.1446342 1 1 1
-0.7344267 0 24 0.0693956 -0.0653022 -0.1153022 1 0 1
-0.8519376 1 20 -1.6626489 -0.9313244 -1.0813244 1 1 1

We have our outcome variable (outcome), our time-stable baseline variables (gender and age), our pre-treatment time-varying variables (use0, measured before the first treatment, use1, and use2), and our three time-varying treatment variables (tx1, tx2, and tx3). We are interested in the joint, unique, causal effects of each treatment period on the outcome. At each treatment time point, we need to achieve balance on all variables measured prior to that treatment, including previous treatments.

Using cobalt, we can examine the initial imbalance at each time point and overall:

library("cobalt") #if not already attached
bal.tab(list(tx1 ~ age + gender + use0,
             tx2 ~ tx1 + use1 + age + gender + use0,
             tx3 ~ tx2 + use2 + tx1 + use1 + age + gender + use0),
        data = iptwExWide, stats = c("m", "ks"), thresholds = c(m = .05),
        which.time = .all)
## Balance by Time Point
## 
##  - - - Time: 1 - - - 
## Balance Measures
##           Type Diff.Un      M.Threshold.Un  KS.Un
## age    Contin.  0.3799 Not Balanced, >0.05 0.2099
## gender  Binary  0.2945 Not Balanced, >0.05 0.2945
## use0   Contin.  0.2668 Not Balanced, >0.05 0.1681
## 
## Balance tally for mean differences
##                     count
## Balanced, <0.05         0
## Not Balanced, >0.05     3
## 
## Variable with the greatest mean difference
##  Variable Diff.Un      M.Threshold.Un
##       age  0.3799 Not Balanced, >0.05
## 
## Sample sizes
##     Control Treated
## All     294     706
## 
##  - - - Time: 2 - - - 
## Balance Measures
##           Type Diff.Un      M.Threshold.Un  KS.Un
## tx1     Binary  0.1695 Not Balanced, >0.05 0.1695
## use1   Contin.  0.0848 Not Balanced, >0.05 0.0763
## age    Contin.  0.2240 Not Balanced, >0.05 0.1331
## gender  Binary  0.1927 Not Balanced, >0.05 0.1927
## use0   Contin.  0.1169 Not Balanced, >0.05 0.0913
## 
## Balance tally for mean differences
##                     count
## Balanced, <0.05         0
## Not Balanced, >0.05     5
## 
## Variable with the greatest mean difference
##  Variable Diff.Un      M.Threshold.Un
##       age   0.224 Not Balanced, >0.05
## 
## Sample sizes
##     Control Treated
## All     492     508
## 
##  - - - Time: 3 - - - 
## Balance Measures
##           Type Diff.Un      M.Threshold.Un  KS.Un
## tx2     Binary  0.2423 Not Balanced, >0.05 0.2423
## use2   Contin.  0.1087 Not Balanced, >0.05 0.1161
## tx1     Binary  0.1071 Not Balanced, >0.05 0.1071
## use1   Contin.  0.1662 Not Balanced, >0.05 0.1397
## age    Contin.  0.3431 Not Balanced, >0.05 0.1863
## gender  Binary  0.1532 Not Balanced, >0.05 0.1532
## use0   Contin.  0.1859 Not Balanced, >0.05 0.1350
## 
## Balance tally for mean differences
##                     count
## Balanced, <0.05         0
## Not Balanced, >0.05     7
## 
## Variable with the greatest mean difference
##  Variable Diff.Un      M.Threshold.Un
##       age  0.3431 Not Balanced, >0.05
## 
## Sample sizes
##     Control Treated
## All     415     585
##  - - - - - - - - - - -

bal.tab() indicates significant imbalance on most covariates at most time points, so we need to do some work to eliminate that imbalance in our weighted data set. We’ll use the weightitMSM() function to specify our weight models. The syntax is similar both to that of weightit() for point treatments and to that of bal.tab() for longitudinal treatments. We’ll use method = "ps" and stabilize = TRUE for stabilized propensity score weights estimated using logistic regression.

Wmsm.out <- weightitMSM(list(tx1 ~ age + gender + use0,
                             tx2 ~ tx1 + use1 + age + gender + use0,
                             tx3 ~ tx2 + use2 + tx1 + use1 + age + gender + use0),
                        data = iptwExWide, method = "ps",
                        stabilize = TRUE)
Wmsm.out
## A weightitMSM object
##  - method: "ps" (propensity score weighting)
##  - number of obs.: 1000
##  - sampling weights: none
##  - number of time points: 3 (tx1, tx2, tx3)
##  - treatment: 
##     + time 1: 2-category
##     + time 2: 2-category
##     + time 3: 2-category
##  - covariates: 
##     + baseline: age, gender, use0
##     + after time 1: tx1, use1, age, gender, use0
##     + after time 2: tx2, use2, tx1, use1, age, gender, use0
##  - stabilized; stabilization factors:
##     + baseline: (none)
##     + after time 1: tx1
##     + after time 2: tx1, tx2, tx1:tx2

No matter which method is selected, weightitMSM() estimates separate weights for each time period and then takes the product of the weights for each individual to arrive at the final estimated weights. Printing the output of weightitMSM() provides some details about the function call and the output. We can take a look at the quality of the weights with summary(), just as we could for point treatments.

summary(Wmsm.out)
##                  Summary of weights
## 
##                        Time 1                       
##                  Summary of weights
## 
## - Weight ranges:
## 
##            Min                                  Max
## treated 0.4767  |---------|                  3.8963
## control 0.2900 |---------------------------| 8.8680
## 
## - Units with 5 most extreme weights by group:
##                                            
##             348    951    715    657    442
##  treated 2.8052 2.9868 3.1041 3.3316 3.8963
##             206    518     95    282    547
##  control  3.405 3.6434  4.687  5.121  8.868
## 
## - Weight statistics:
## 
##         Coef of Var   MAD Entropy # Zeros
## treated       0.420 0.281   0.073       0
## control       0.775 0.423   0.183       0
## 
## - Mean of Weights = 1
## 
## - Effective Sample Sizes:
## 
##            Control Treated
## Unweighted  294.    706.  
## Weighted    183.85  600.26
## 
##                        Time 2                       
##                  Summary of weights
## 
## - Weight ranges:
## 
##            Min                                  Max
## treated 0.3911 |----------|                  3.8963
## control 0.2900 |---------------------------| 8.8680
## 
## - Units with 5 most extreme weights by group:
##                                            
##             951    980    715    657    442
##  treated 2.9868 3.0427 3.1041 3.3316 3.8963
##             206    518     95    282    547
##  control  3.405 3.6434  4.687  5.121  8.868
## 
## - Weight statistics:
## 
##         Coef of Var   MAD Entropy # Zeros
## treated       0.482 0.333   0.096       0
## control       0.598 0.309   0.113       0
## 
## - Mean of Weights = 1
## 
## - Effective Sample Sizes:
## 
##            Control Treated
## Unweighted  492.    508.  
## Weighted    362.67  412.39
## 
##                        Time 3                       
##                  Summary of weights
## 
## - Weight ranges:
## 
##            Min                                  Max
## treated 0.4767  |---------|                  3.8963
## control 0.2900 |---------------------------| 8.8680
## 
## - Units with 5 most extreme weights by group:
##                                           
##             109    715    657   206    442
##  treated 3.0238 3.1041 3.3316 3.405 3.8963
##             980    518     95   282    547
##  control 3.0427 3.6434  4.687 5.121  8.868
## 
## - Weight statistics:
## 
##         Coef of Var   MAD Entropy # Zeros
## treated       0.488 0.337   0.097       0
## control       0.609 0.300   0.115       0
## 
## - Mean of Weights = 1
## 
## - Effective Sample Sizes:
## 
##            Control Treated
## Unweighted   415.   585.  
## Weighted     302.9  472.55

Displayed are summaries of how the weights perform at each time point with respect to variability. Next, we’ll examine how well they perform with respect to covariate balance.

bal.tab(Wmsm.out, stats = c("m", "ks"), thresholds = c(m = .05),
        which.time = .none)
## Call
##  weightitMSM(formula.list = list(tx1 ~ age + gender + use0, tx2 ~ 
##     tx1 + use1 + age + gender + use0, tx3 ~ tx2 + use2 + tx1 + 
##     use1 + age + gender + use0), data = iptwExWide, method = "ps", 
##     stabilize = TRUE)
## 
## Balance summary across all time points
##              Times     Type Max.Diff.Adj         M.Threshold Max.KS.Adj
## prop.score 1, 2, 3 Distance       0.3217                         0.1748
## age        1, 2, 3  Contin.       0.0153     Balanced, <0.05     0.0850
## gender     1, 2, 3   Binary       0.0214     Balanced, <0.05     0.0214
## use0       1, 2, 3  Contin.       0.0549 Not Balanced, >0.05     0.0952
## tx1           2, 3   Binary       0.1544 Not Balanced, >0.05     0.1544
## use1          2, 3  Contin.       0.0495     Balanced, <0.05     0.0547
## tx2              3   Binary       0.2396 Not Balanced, >0.05     0.2396
## use2             3  Contin.       0.1012 Not Balanced, >0.05     0.0697
## 
## Balance tally for mean differences
##                     count
## Balanced, <0.05         3
## Not Balanced, >0.05     4
## 
## Variable with the greatest mean difference
##  Variable Max.Diff.Adj         M.Threshold
##       tx2       0.2396 Not Balanced, >0.05
## 
## Effective sample sizes
##  - Time 1
##            Control Treated
## Unadjusted  294.    706.  
## Adjusted    183.85  600.26
##  - Time 2
##            Control Treated
## Unadjusted  492.    508.  
## Adjusted    362.67  412.39
##  - Time 3
##            Control Treated
## Unadjusted   415.   585.  
## Adjusted     302.9  472.55

By setting which.time = .none in bal.tab(), we can focus on the overall balance assessment, which displays the greatest imbalance for each covariate across time points. We can see that our estimated weights balance all covariates all time points with respect to means and variances. Now we can estimate our treatment effects. We’ll sequentially simplify our model by checking whether interaction terms are needed (implying that specific patterns of treatment yield different outcomes), then by checking whether different coefficients are needed for the treatments (implying that outcomes depend on which treatments are received).

library("survey")
d.w.msm <- svydesign(~1, weights = Wmsm.out$weights,
                     data = iptwExWide)
full.fit <- svyglm(outcome ~ tx1*tx2*tx3, design = d.w.msm)
main.effects.fit <- svyglm(outcome ~ tx1 + tx2 + tx3, design = d.w.msm)
anova(full.fit, main.effects.fit)
## Working (Rao-Scott+F) LRT for tx1:tx2 tx1:tx3 tx2:tx3 tx1:tx2:tx3
##  in svyglm(formula = outcome ~ tx1 * tx2 * tx3, design = d.w.msm)
## Working 2logLR =  7.689984 p= 0.10639 
## (scale factors:  1.3 1.1 0.96 0.65 );  denominator df= 992

Based on the non-significant p-value, we don’t have to assume specific treatment patterns yield different outcomes, but rather only that which treatments received or the number of treatments received are sufficient to explain variation in the outcome. Next we’ll narrow down these options by comparing the main effects fit to one that constrains the coefficients to be equal (implying that the cumulative number of treatments received is what matters), as Robins, Hernán, and Brumback (2000) describe.

cum.fit <- svyglm(outcome ~ I(tx1+tx2+tx3), design = d.w.msm)
anova(main.effects.fit, cum.fit)
## Working (Rao-Scott+F) LRT for tx1 tx2 tx3 - I(tx1 + tx2 + tx3)
##  in svyglm(formula = outcome ~ tx1 + tx2 + tx3, design = d.w.msm)
## Working 2logLR =  1.840286 p= 0.40005 
## (scale factors:  1 0.96 );  denominator df= 996
anova(full.fit, cum.fit)
## Working (Rao-Scott+F) LRT for tx1 tx2 tx3 tx1:tx2 tx1:tx3 tx2:tx3 tx1:tx2:tx3 - I(tx1 + tx2 + tx3)
##  in svyglm(formula = outcome ~ tx1 * tx2 * tx3, design = d.w.msm)
## Working 2logLR =  9.504989 p= 0.15137 
## (scale factors:  1.3 1.2 1.1 1 0.73 0.58 );  denominator df= 992

Based on the non-significant p-value, we can assume the effects of each treatment are close enough to be treated as the same, indicating that the number of treatments received is the relevant predictor of the outcome. Now we can examine what that treatment effect is with summ() in jtools (or summary()).

summary(cum.fit)
## 
## Call:
## svyglm(formula = outcome ~ I(tx1 + tx2 + tx3), design = d.w.msm)
## 
## Survey design:
## svydesign(~1, weights = Wmsm.out$weights, data = iptwExWide)
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         0.13072    0.05412   2.416   0.0159 *  
## I(tx1 + tx2 + tx3) -0.15157    0.02734  -5.545 3.77e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.5307998)
## 
## Number of Fisher Scoring iterations: 2
confint(cum.fit)
##                          2.5 %      97.5 %
## (Intercept)         0.02452672  0.23691596
## I(tx1 + tx2 + tx3) -0.20520895 -0.09792381

For each additional treatment received, the outcome is expected to decrease by 0.15 points. The confidence interval excludes 0, so there is evidence of a treatment effect in the population.

There is more we can do, as well. We could have fit different types of models to estimate the weights, and we could have stabilized the weights with stabilize = TRUE or by including stabilization factors in our weights using num.formula (see Cole and Hernán (2008) for more details on doing so). There are other ways of computing confidence intervals for our effect estimates (although model comparison is the most straightforward with the method we used).

References

Austin, Peter C. 2011. “An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies.” Multivariate Behavioral Research 46 (3): 399–424. https://doi.org/10.1080/00273171.2011.568786.
Austin, Peter C., and Elizabeth A. Stuart. 2015. “Moving Towards Best Practice When Using Inverse Probability of Treatment Weighting (IPTW) Using the Propensity Score to Estimate Causal Treatment Effects in Observational Studies.” Statistics in Medicine 34 (28): 3661–79. https://doi.org/10.1002/sim.6607.
Chan, Kwun Chuen Gary, Sheung Chi Phillip Yam, and Zheng Zhang. 2016. “Globally Efficient Non-Parametric Inference of Average Treatment Effects by Empirical Balancing Calibration Weighting.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78 (3): 673–700. https://doi.org/10.1111/rssb.12129.
Cole, Stephen R., and Miguel A Hernán. 2008. “Constructing Inverse Probability Weights for Marginal Structural Models.” American Journal of Epidemiology 168 (6): 656–64. https://doi.org/10.1093/aje/kwn164.
Hainmueller, J. 2012. “Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies.” Political Analysis 20 (1): 25–46. https://doi.org/10.1093/pan/mpr025.
Reifeis, Sarah A., and Michael G. Hudgens. 2020. “On Variance of the Treatment Effect in the Treated Using Inverse Probability Weighting.” arXiv:2011.11874 [Stat], November. https://arxiv.org/abs/2011.11874.
Robins, James M., Miguel Ángel Hernán, and Babette Brumback. 2000. “Marginal Structural Models and Causal Inference in Epidemiology.” Epidemiology 11 (5): 550–60. https://doi.org/10.2307/3703997.
Thoemmes, Felix J., and Anthony D. Ong. 2016. “A Primer on Inverse Probability of Treatment Weighting and Marginal Structural Models.” Emerging Adulthood 4 (1): 40–59. https://doi.org/10.1177/2167696815621645.