Interpret regular expressions

library(coder)

Classcodes objects (as described in vignette("classcodes")) use regular expressions to classify/categorize individual codes into groups (i.e. comorbidity conditions). Those regular expressions might be hard to interpret on their own. Several methods are therefore available to aid such interpretation of the classcodes objects.

visualize()

A graphical representation of a classcodes object is created by visualize(). It will be showed in the default web browser (requires an Internet connection; not available within this vignette).

visualize(charlson)

Visualization of all groups (comorbidity conditions) simultaneously might lead to complex figures. We can focus on a specific group (comorbidity) by the group argument. How is myocardial infarction codified by regex_icd9cm_deyo?

visualize(charlson, "myocardial infarction", regex    = "icd9cm_deyo")

Hence, all ICD-9 codes starting with 41 followed by either 0 or 2 will be recognized as myocardial infarction according to icd9cm_deyo. The corresponding regular expression for ICD-10 is:

visualize(charlson, "myocardial infarction", regex = "icd10")

Such codes should start with I2 followed by either 1, 2 or 52. The vertical bar | (in the regular expression of the heading) indicates a logical “or.” See ?regex for more details on how to use regular expressions in R (Perl-like versions are currently not allowed).

summary()

An alternative representation is to list all relevant codes identified by each regular expression. This is implemented by the summary() method for classcodes objects. Note, however, that the regular expressions are stand alone in each classcodes object. Hence, there are no static look-up-tables to map individual codes to each group. We therefore need to specify a code list/dictionary of all possible codes to be recognized by those regular expressions. Then summary() will categorize those and display the result. Common code lists are found in the decoder package and are accessed automatically through the coding argument to summary(). Hence, there is a “keyvalue” object icd10cm with all ICD-10-CM codes in {decoder}:

head(decoder::icd10cm)
#>     key                                              value
#> 1  A000 Cholera due to Vibrio cholerae 01, biovar cholerae
#> 2  A001    Cholera due to Vibrio cholerae 01, biovar eltor
#> 3  A009                               Cholera, unspecified
#> 4 A0100                         Typhoid fever, unspecified
#> 5 A0101                                 Typhoid meningitis
#> 6 A0102               Typhoid fever with heart involvement

We can use this code list to identify all codes recognized by charlson with its default classification based on “icd10.” The printed result (see ?print.summary.classcodes) is a tibble with each group and a comma separated code list.

s <- summary(charlson, coding = "icd10cm")
#> Classification based on: icd10
s
#> 
#> Summary of classcodes object
#> 
#> Recognized codes per group:
#> 
#> # A tibble: 17 x 3
#>    group                      n codes                                           
#>    <chr>                  <int> <chr>                                           
#>  1 AIDS/HIV                   1 B20                                             
#>  2 cerebrovascular disea…   430 G450, G451, G452, G453, G454, G458, G459, G460,…
#>  3 chronic pulmonary dis…    69 I2781, I2782, I2783, I2789, I279, J40, J410, J4…
#>  4 congestive heart fail…    36 I099, I110, I130, I132, I255, I420, I425, I426,…
#>  5 dementia                  11 F0150, F0151, F0280, F0281, F0390, F0391, G300,…
#>  6 diabetes complication    204 E1021, E1022, E1029, E10311, E10319, E103211, E…
#>  7 diabetes without comp…    52 E1010, E1011, E10610, E10618, E10620, E10621, E…
#>  8 hemiplegia or paraple…    45 G041, G114, G801, G802, G8100, G8101, G8102, G8…
#>  9 malignancy               961 C000, C001, C002, C003, C004, C005, C006, C008,…
#> 10 metastatic solid tumor    47 C770, C771, C772, C773, C774, C775, C778, C779,…
#> 11 mild liver disease        38 B180, B181, B182, B188, B189, K700, K7010, K701…
#> 12 moderate or severe li…    14 I8500, I8501, I864, K7040, K7041, K7110, K7111,…
#> 13 myocardial infarction     18 I2101, I2102, I2109, I2111, I2119, I2121, I2129…
#> 14 peptic ulcer disease      36 K250, K251, K252, K253, K254, K255, K256, K257,…
#> 15 peripheral vascular d…   274 I700, I701, I70201, I70202, I70203, I70208, I70…
#> 16 renal disease             28 I120, I1310, I1311, N032, N033, N034, N035, N03…
#> 17 rheumatic disease        348 M0500, M05011, M05012, M05019, M05021, M05022, …
#> 
#>  Use function visualize() for a graphical representation.

A list with all code vectors (to use for programmatic purposes) is also returned (invisible) and accessed by s$codes_vct.

Now, compare the result above with the output based on a different code list, namely ICD-10-SE, the Swedish version of ICD-10, instead of ICD-10-CM:

summary(charlson, coding = "icd10se")
#> Classification based on: icd10
#> 
#> Summary of classcodes object
#> 
#> Recognized codes per group:
#> 
#> # A tibble: 17 x 3
#>    group                      n codes                                           
#>    <chr>                  <int> <chr>                                           
#>  1 AIDS/HIV                  22 B200, B201, B202, B203, B204, B205, B206, B207,…
#>  2 cerebrovascular disea…    82 G450, G451, G452, G453, G454, G458, G459, G460,…
#>  3 chronic pulmonary dis…    57 I278, I279, J409, J410, J411, J418, J429, J430,…
#>  4 congestive heart fail…    19 I099, I110, I130, I132, I255, I420, I425, I426,…
#>  5 dementia                  23 F000, F001, F002, F009, F010, F011, F012, F013,…
#>  6 diabetes complication     71 E102, E102A, E102B, E102C, E102W, E102X, E103, …
#>  7 diabetes without comp…    55 E100, E100A, E100B, E100C, E100D, E100X, E101, …
#>  8 hemiplegia or paraple…    22 G041, G114, G801, G801A, G801B, G801X, G802, G8…
#>  9 malignancy               525 C000, C001, C002, C003, C004, C005, C006, C008,…
#> 10 metastatic solid tumor    29 C770, C771, C772, C773, C774, C775, C778, C779,…
#> 11 mild liver disease        83 B180, B180A, B180B, B180C, B180D, B180E, B180F,…
#> 12 moderate or severe li…    11 I850, I859, I864, I982, K704, K711, K721, K729,…
#> 13 myocardial infarction     15 I210, I211, I212, I213, I214, I214A, I214B, I21…
#> 14 peptic ulcer disease      36 K250, K251, K252, K253, K254, K255, K256, K257,…
#> 15 peripheral vascular d…    43 I700, I700A, I700B, I700X, I701, I702, I702A, I…
#> 16 renal disease             27 I120, I131, N032, N033, N034, N035, N036, N037,…
#> 17 rheumatic disease         63 M050, M051, M052, M053, M058, M058A, M058B, M05…
#> 
#>  Use function visualize() for a graphical representation.

There are some noticeable differences. AIDS/HIV for example has only one code deemed clinically relevant in the USA (thus included in the CM-version of ICD-10), although there are 22 different codes potentially used in the Swedish national patient register. There are additional differences concerning the fifth code position (digits in ICD-10-CM and characters in ICD-10-SE). Those mark national modifications to the original ICD-10 codes, which has only 4 positions (one character and three digits). For this example, the charlson$icd10 column was based on ICD-10-CM (Quan et al. 2005). The comparison above thus highlights potential differences when using this classification in a setting based on another classification (such as with data from the Swedish national patient register).

If we are interested in another code version, for example as specified by ICD-9-CM (Deyo, Cherkin, and Ciol 1992) , this can be specified by the regex-argument passed by the cc_args argument to the set_classcodes function. Simultaneously, the coding argument is set to icd9cmd to match the regular expressions to the disease part of ICD-9-CM classification.

summary(
  charlson, coding = "icd9cmd",
  cc_args = list(regex = "icd9cm_deyo")
)
#> 
#> Summary of classcodes object
#> 
#> Recognized codes per group:
#> 
#> # A tibble: 17 x 3
#>    group                      n codes                                           
#>    <chr>                  <int> <chr>                                           
#>  1 AIDS/HIV                   1 042                                             
#>  2 cerebrovascular disea…    69 430, 431, 4320, 4321, 4329, 43300, 43301, 43310…
#>  3 chronic pulmonary dis…     8 490, 500, 501, 502, 503, 504, 505, 5064         
#>  4 congestive heart fail…    15 4280, 4281, 42820, 42821, 42822, 42823, 42830, …
#>  5 dementia                  14 2900, 29010, 29011, 29012, 29013, 29020, 29021,…
#>  6 diabetes complication     12 25040, 25041, 25042, 25043, 25050, 25051, 25052…
#>  7 diabetes without comp…    20 25000, 25001, 25002, 25003, 25010, 25011, 25012…
#>  8 hemiplegia or paraple…    13 34200, 34201, 34202, 34210, 34211, 34212, 34280…
#>  9 malignancy               628 1400, 1401, 1403, 1404, 1405, 1406, 1408, 1409,…
#> 10 metastatic solid tumor    30 1960, 1961, 1962, 1963, 1965, 1966, 1968, 1969,…
#> 11 mild liver disease         7 5712, 57140, 57141, 57142, 57149, 5715, 5716    
#> 12 moderate or severe li…     6 4560, 4561, 5722, 5723, 5724, 5728              
#> 13 myocardial infarction     31 41000, 41001, 41002, 41010, 41011, 41012, 41020…
#> 14 peptic ulcer disease      72 53100, 53101, 53110, 53111, 53120, 53121, 53130…
#> 15 peripheral vascular d…    15 44100, 44101, 44102, 44103, 4411, 4412, 4413, 4…
#> 16 renal disease             26 5820, 5821, 5822, 5824, 58281, 58289, 5829, 583…
#> 17 rheumatic disease          8 7100, 7101, 7104, 7140, 7141, 7142, 71481, 725  
#> 
#>  Use function visualize() for a graphical representation.

codebook()

Even with individual codes summarized, those might still be hard to interpret on their own. The decoder package can help to translate codes to readable names/description. This is facilitated by the codebook() function in the {coder} package.

The main purpose is to export an Excel-file (if path specified by argument file). The output is otherwise a list, including both a summary table (described above) and a tibble with “all_codes” explaining the meaning of each code.

We can compare the codes recognized as AIDS/HIV by either ICD-10-CM or ICD-10-SE:


cm <- codebook(charlson, "icd10cm")$all_codes
#> Classification based on: icd10
cm[cm$group == "AIDS/HIV", ]
#> # A tibble: 1 x 3
#>   code  description                                group   
#>   <chr> <chr>                                      <chr>   
#> 1 B20   Human immunodeficiency virus [HIV] disease AIDS/HIV

se <- codebook(charlson, "icd10se")$all_codes
#> Classification based on: icd10
se[se$group == "AIDS/HIV", ]
#> # A tibble: 22 x 3
#>    code  description                                                     group  
#>    <chr> <chr>                                                           <chr>  
#>  1 B200  HIV-infektion med mykobakterieinfektion                         AIDS/H…
#>  2 B201  HIV-infektion med andra bakterieinfektioner                     AIDS/H…
#>  3 B202  HIV-infektion med cytomegalvirusinfektion                       AIDS/H…
#>  4 B203  HIV-infektion med andra virusinfektioner                        AIDS/H…
#>  5 B204  HIV-infektion med candidainfektion                              AIDS/H…
#>  6 B205  HIV-infektion med andra mykoser                                 AIDS/H…
#>  7 B206  HIV-infektion med Pneumocystis jirovecii (carinii)-pneumoni     AIDS/H…
#>  8 B207  HIV-infektion med multipla infektioner                          AIDS/H…
#>  9 B208  HIV-infektion med andra infektions- och parasitsjukdomar        AIDS/H…
#> 10 B209  HIV-infektion med ospecificerad infektions- eller parasitsjukd… AIDS/H…
#> # … with 12 more rows

codebooks()

Several codebooks can be combined (exported to a single Excel-file) by the function codebooks() (note the plural s). This is difficult to illustrate in a vignette but examples are provided in ?codebooks

Bibliography

Deyo, Richard A., Daniel C. Cherkin, and Marcia A. Ciol. 1992. “Adapting a Clinical Comorbidity Index for Use with ICD-9-CM Administrative Databases.” Journal of Clinical Epidemiology 45 (6): 613–19. https://doi.org/10.1016/0895-4356(92)90133-8.
Quan, Hude, Vijaya Sundararajan, Patricia Halfon, Andrew Fong, Bernard Burnand, Jean-Christophe Luthi, L Duncan Saunders, Cynthia a Beck, Thomas E Feasby, and William a Ghali. 2005. “Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data.” Medical Care 43 (11): 1130–39. https://doi.org/10.1097/01.mlr.0000182534.19832.83.
———. 2005. “Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data.” Medical Care 43 (11): 1130–39. https://doi.org/10.1097/01.mlr.0000182534.19832.83.