CausalQueries

R-CMD-check CRAN status

CausalQueries is a package that lets you declare binary causal models, update beliefs about causal types given data and calculate arbitrary estimands. Model definition makes use of dagitty functionality. Updating is implemented in stan.

See here for a guide to using CausalQueries along with many examples of causal models

Installation

To install the latest stable release of CausalQueries:

install.packages("CausalQueries")

To install the latest development release :

install.packages("remotes")
remotes::install_github("macartan/CausalQueries")

Causal models

Causal models are defined by:

A wrinkle:

Inference

Our goal is to form beliefs over parameters but also over more substantive estimands:

Example

Here is an example of a model in which X causes M and M causes Y. There is, in addition, unobservable confounding between X and Y. This is an example of a model in which you might use information on M to figure out whether X caused Y.

The DAG is defined using dagitty syntax like this:

model <- make_model("X -> M -> Y")

To add the confounding we have to allow an additional parameter that allows a possibly different assignment probability for X given a causal type for Y.

model <- set_confound(model, list(X = "Y[X=1] == 1"))

We then set priors thus:

model <- set_priors(model, distribution = "jeffreys")

You can plot the dag, making use of functions in the dagitty package.

plot(model)

You can draw data from the model, like this:

data <- make_data(model, n = 10)

Updating is done like this:

updated_model <- update_model(model, data)

Finally you can calculate an estimand of interest like this:

CoE <- query_distribution(
                   model = updated_model, 
                   using = "posteriors",
                   query = "Y[X=0] == 0",
                   subset = "X==1 & Y==1"
                   )

This uses the posterior distribution and the model to assess the “causes of effects” estimand: the probability that X=1 was the cause of Y=1 in those cases in which X=1 and Y=1. The approach is to imagine a set of “do” operations on the model, that control the level of X and to inquire about the level of Y given these operations, and then to assess how likely is is that Y would be 0 if X were fixed at 0 within a set that naturally take on particular values of X and Y. By the same token this posterior can be calculated conditional on observations of M, allowing an assessment of how data on mediators alters inference about the causes of effects.

Credits etc

The approach used in CausalQueries is a generalization of the biqq models described in “Mixing Methods: A Bayesian Approach” (Humphreys and Jacobs, 2015, https://doi.org/10.1017/S0003055415000453). The conceptual extension makes use of work on probabilistic causal models described in Pearl’s Causality (Pearl, 2009, https://doi.org/10.1017/CBO9780511803161). The approach to generating a generic stan function that can take data from arbitrary models was developed in key contributions by Jasper Cooper (http://jasper-cooper.com/) and Georgiy Syunyaev (http://gsyunyaev.com/). Lily Medina (https://lilymedina.github.io/) did the magical work of pulling it all together and developing approaches to characterizing confounding and defining estimands. Julio Solis has done wonders to simplify the specification of priors.