Working with the socialrisk Package

Wyatt P. Bensken

2022-03-09

Introduction

The goal of socialrisk is to create an efficient way to identify social risk from administrative health care data using ICD-10 diagnosis codes.

Load Sample Data

We’ve created a sample dataset of ICD-10 administrative data which we can load in.

i10_wide
#> # A tibble: 29 × 11
#>    patient_id sex   date_of_serv dx1   dx2   dx3   dx4   dx5   visit_type hcpcs
#>    <fct>      <chr> <date>       <chr> <chr> <chr> <chr> <chr> <chr>      <chr>
#>  1 1001       male  2020-02-14   E876  Z560  Z6372 Z654  E440  ip         E2201
#>  2 1001       male  2021-05-15   J189  Z644  A408  I10   G309  ip         E2201
#>  3 1001       male  2021-01-10   I119  Z628  I10   <NA>  <NA>  ot         E2201
#>  4 1001       male  2021-04-02   G309  K731  Z591  <NA>  <NA>  ot         E2201
#>  5 1001       male  2021-05-06   E039  I10   J189  <NA>  <NA>  ot         E2201
#>  6 1001       male  2021-06-04   J189  Z604  F329  <NA>  <NA>  ot         E2201
#>  7 1001       male  2021-10-01   E0800 G309  I10   <NA>  <NA>  ot         E2201
#>  8 1001       male  2021-11-05   I6011 I10   F329  R930  <NA>  ot         E2201
#>  9 1001       male  2022-02-01   M546  G309  I10   I6011 <NA>  ot         E2201
#> 10 1001       male  2022-03-15   E0800 I10   J189  F329  <NA>  ot         E2201
#> # … with 19 more rows, and 1 more variable: icd_version <dbl>

Preparing the Data

We use the built-in clean_data() function to specify the: dataset, patient id, current data format (wide or long), and the prefix of the diagnoses variables.

data <- clean_data(dat = i10_wide,
                   id = patient_id,
                   style = "wide",
                   prefix_dx = "dx")
#> # A tibble: 10 × 2
#>    patient_id dx   
#>    <fct>      <chr>
#>  1 1001       E876 
#>  2 1001       Z560 
#>  3 1001       Z6372
#>  4 1001       Z654 
#>  5 1001       E440 
#>  6 1001       J189 
#>  7 1001       Z644 
#>  8 1001       A408 
#>  9 1001       I10  
#> 10 1001       G309

Social Risk

Now, we can run our various social risk functions, with varying taxonomies.

Centers for Medicare and Medicaid Services (CMS)

cms <- socialrisk(dat = data, id = patient_id, dx = dx, taxonomy = "cms")
#> # A tibble: 5 × 12
#>   patient_id any_social_risk number_domains z55_education z56_employment
#>   <fct>                <dbl>          <dbl>         <dbl>          <dbl>
#> 1 1001                     1              7             0              1
#> 2 1002                     1              2             1              0
#> 3 1003                     1              2             0              1
#> 4 1004                     0              0             0              0
#> 5 1005                     0              0             0              0
#> # … with 7 more variables: z57_occupation <dbl>, z59_housing <dbl>,
#> #   z60_social <dbl>, z62_upbringing <dbl>, z63_family <dbl>,
#> #   z64_psychosocial <dbl>, z65_psychosocial_other <dbl>

Missouri Hospital Association

mha <- socialrisk(dat = data, id = patient_id, dx = dx, taxonomy = "mha")
#> # A tibble: 5 × 8
#>   patient_id any_social_risk number_domains employment family housing
#>   <fct>                <dbl>          <dbl>      <dbl>  <dbl>   <dbl>
#> 1 1001                     1              5          1      1       1
#> 2 1002                     1              2          0      0       1
#> 3 1003                     1              1          1      0       0
#> 4 1004                     0              0          0      0       0
#> 5 1005                     0              0          0      0       0
#> # … with 2 more variables: psychosocial <dbl>, ses <dbl>

SIREN - UCSF

siren <- socialrisk(dat = data, id = patient_id, dx = dx, taxonomy = "siren")
#> Note: The SIREN Compendium assigns multiple domains to each code, resulting in non-mutally exclusive groups.
#> # A tibble: 5 × 19
#>   patient_id any_social_risk number_domains access education employment finances
#>   <fct>                <dbl>          <dbl>  <dbl>     <dbl>      <dbl>    <dbl>
#> 1 1001                     1              5      0         0          1        0
#> 2 1002                     1              6      1         1          0        1
#> 3 1003                     1              1      0         0          1        0
#> 4 1004                     0              0      0         0          0        0
#> 5 1005                     0              0      0         0          0        0
#> # … with 12 more variables: food <dbl>, housing <dbl>, immigration <dbl>,
#> #   incarceration <dbl>, language <dbl>, race_eth <dbl>, safety <dbl>,
#> #   soc_connect <dbl>, stress <dbl>, transportation <dbl>, utilities <dbl>,
#> #   veteran <dbl>