psychtm: A package for text mining in psychological research

Project Status: Active – The project has reached a stable, usable state and is being actively developed. R-CMD-check CRAN status Codecov test coverage

The goal of psychtm is to make text mining models and methods accessible for social science researchers, particularly within psychology. This package allows users to

Installation

Once on CRAN, install the package as usual:

install.packages("psychtm")

Alternatively, you can install the most current development version:

install.packages("devtools")

Option 1: Install the latest stable version from Github

devtools::install_github("ktw5691/psychtm")

Option 2: Install the latest development snapshot

devtools::install_github("ktw5691/psychtm@devel")

Example

This is a basic example which shows you how to (1) prepare text documents stored in a data frame; (2) fit a supervised topic model with covariates (SLDAX); and (3) summarize the regression relationships from the estimated SLDAX model.

library(psychtm)
library(lda) # Required if using `prep_docs()`

data(teacher_rate)  # Synthetic student ratings of instructors
docs_vocab <- prep_docs(teacher_rate, "doc")
vocab_len <- length(docs_vocab$vocab)
fit_sldax <- gibbs_sldax(rating ~ I(grade - 1),
                         data = teacher_rate,
                         docs = docs_vocab$documents,
                         V = vocab_len,
                         K = 2,
                         model = "sldax")
eta_post <- post_regression(fit_sldax)
summary(eta_post)
#> 
#> Iterations = 1:100
#> Thinning interval = 1 
#> Number of chains = 1 
#> Sample size per chain = 100 
#> 
#> 1. Empirical mean and standard deviation for each variable,
#>    plus standard error of the mean:
#> 
#>                 Mean       SD  Naive SE Time-series SE
#> I(grade - 1) -0.2656 0.007307 0.0007307      0.0007307
#> topic1        4.6165 0.122216 0.0122216      0.0804883
#> topic2        4.8189 0.034301 0.0034301      0.0034301
#> effect_t1    -0.2024 0.134106 0.0134106      0.0884898
#> effect_t2     0.2024 0.134106 0.0134106      0.0884898
#> sigma2        1.1422 0.028296 0.0028296      0.0028296
#> 
#> 2. Quantiles for each variable:
#> 
#>                  2.5%     25%     50%     75%    97.5%
#> I(grade - 1) -0.27849 -0.2711 -0.2659 -0.2601 -0.25175
#> topic1        4.34365  4.5709  4.6584  4.6945  4.76228
#> topic2        4.75032  4.7994  4.8181  4.8420  4.87593
#> effect_t1    -0.51412 -0.2639 -0.1828 -0.1086 -0.01216
#> effect_t2     0.01216  0.1086  0.1828  0.2639  0.51412
#> sigma2        1.08793  1.1245  1.1445  1.1599  1.20649

For a more detailed example of the key functionality of this package, explore the vignette(s) for a good starting point:

browseVignettes("psychtm")

How to Cite the Package

Wilcox, K. T., Jacobucci, R., Zhang, Z., Ammerman, B. A. (2021). Supervised latent Dirichlet allocation with covariates: A Bayesian structural and measurement model of text and covariates. PsyArXiv. https://doi.org/10.31234/osf.io/62tc3

Common Troubleshooting

Ensure that appropriate C++ compilers are installed on your computer:

Limitations

Getting Help

If you think you have found a bug, please open an issue and provide a minimal complete verifiable example.