ICD10gm is an R Package for working with the German Modification of the International Statistical Classification of Diseases and Related Health Problems (ICD-10-GM).
The ICD-10 classification is an international standard for the coding of health service data. It is used widely both to document morbidity in healthcare systems, usually in the context of remuneration claims, and to encode mortality statistics. In Germany, the German Instutite of Medical Documentation and Information (DIMDI) releases a German Modification (ICD-10-GM) of the classification that forms a compulsory part of all remuneration claims in the ambulatory and hospital sectors. Further information and historical context can be found in, for example, Graubner (2007) or Jetté et al (2010).
This package was created to facilitate the analysis of data coded using the ICD-10-GM. In particular, it has the following aims:
ICD10gm is designed for use in the context of medical and health services research using routinely collected claims data. It is not suitable for use in operative coding as it does not include all relevant metadata (e.g. inclusion and exclusion notes and the detailed definitions of psychiatric diagnoses). The metadata provided in the ICD10gm package is not intended to replace the official DIMDI documentation, which should always be consulted when specifying ICD codes for analysis.
The following presents an overview of the basic functionality provided by the ICD10gm package, illustrated by means of simple examples. To access this vignette in R, type:
vignette("icd10gm_intro", package = "ICD10gm")
The ICD-10-GM metadata are provided by four data.frames that form the core of the ICD10gm package:
icd_meta_chapters
: Specification of the ICD-10-GM chapters with label and number (Arabic and Roman numerals)icd_meta_blocks
: Specification of the ICD-10 blocks, each containing a sequence of related ICD-10 codesicd_meta_codes
: Extended metadata for all ICD-10-GM codesicd_meta_transitions
: The “crosswalk” table specifying old and new ICD-10-GM codes for successive annual versions.Documentation for the individual datasets can be accessed using the R help system by typing, for example, either help("icd_meta_codes", package = "ICD10gm")
or simply ?icd_meta_codes
.
While column names have been translated into English, the ICD-10-GM labels are in German with UTF-8 character encoding throughout.
In addition to this tabular data, several utility functions are provided to perform common queries on the metadata:
get_icd_labels
queries icd_meta_codes
and returns only the ICD subcodes and labels for specified codes and years.get_icd_history
queries icd_meta_transitions
and returns only the mappings for specified codes and years.icd_showchanges
and icd_showchanges_icd3
query icd_meta_transitions
and return entries that are affected by changes in the ICD-10-GM versions.First, we load the ICD10gm package alongside some tidyverse packages:
library(dplyr)
library(purrr)
library(tidyr)
library(ICD10gm)
By way of example, we examine the coding of unspecific gastroenteritis (i.e. without identification of a specific cause), a very common diagnosis in primary care. This is currently classified under the ICD-10 code “A09”, which we can lookup for the year 2018 as follows:
get_icd_labels("A09", year = 2018) %>%
::kable(row.names = FALSE) knitr
year | icd3 | icd_code | icd_normcode | icd_sub | label |
---|---|---|---|---|---|
2018 | A09 | A09.- | A09 | A09 | Sonstige und nicht näher bezeichnete Gastroenteritis und Kolitis infektiösen und nicht näher bezeichneten Ursprungs |
2018 | A09 | A09.0 | A09.0 | A090 | Sonstige und nicht näher bezeichnete Gastroenteritis und Kolitis infektiösen Ursprungs |
2018 | A09 | A09.9 | A09.9 | A099 | Sonstige und nicht näher bezeichnete Gastroenteritis und Kolitis nicht näher bezeichneten Ursprungs |
Now, we check whether whether this code has been affected by code transitions in any revision since 2003:
icd_showchanges_icd3("A09") %>%
::kable(row.names = FALSE) knitr
year_from | year_to | icd_from | icd_to | automatic_forward | automatic_backward | change_5 | change_4 | change_3 | change | icd3 | icd_kapitel |
---|---|---|---|---|---|---|---|---|---|---|---|
2009 | 2010 | A09 | A09.0 | A | TRUE | FALSE | FALSE | TRUE | A09 | A | |
2009 | 2010 | A09 | A09.9 | TRUE | FALSE | FALSE | TRUE | A09 | A | ||
2009 | 2010 | K52.9 | A09.9 | FALSE | FALSE | TRUE | TRUE | A09 | A |
Diagnoses that, prior to 2009, were coded as K52.9 are now coded as A09.9. We can investigate exactly what changed by looking the relevant codes for the years 2009 and 2010:
get_icd_labels(icd3 = c("A09", "K52"), year = 2009:2010) %>%
arrange(year, icd_sub) %>%
filter(icd_sub %in% c("K529") | icd3 == "A09") %>%
select(year, icd_normcode, label) %>%
::kable(row.names = FALSE) knitr
year | icd_normcode | label |
---|---|---|
2009 | A09 | Diarrhoe und Gastroenteritis, vermutlich infektiösen Ursprungs |
2009 | K52.9 | Nichtinfektiöse Gastroenteritis und Kolitis, nicht näher bezeichnet |
2010 | A09 | Sonstige und nicht näher bezeichnete Gastroenteritis und Kolitis infektiösen und nicht näher bezeichneten Ursprungs |
2010 | A09.0 | Sonstige und nicht näher bezeichnete Gastroenteritis und Kolitis infektiösen Ursprungs |
2010 | A09.9 | Sonstige und nicht näher bezeichnete Gastroenteritis und Kolitis nicht näher bezeichneten Ursprungs |
2010 | K52.9 | Nichtinfektiöse Gastroenteritis und Kolitis, nicht näher bezeichnet |
Prior to 2010, A09 had been reserved for gastroenteritis of presumed infectious origin (German: vermutlich infektiösen Ursprungs), with unspecified gastroenteritis coded by K52.9. Since 2010, A09.9 codes any unspecified gastroenteritis, with K52.9 reserved for cases determined to be non-infectious. The effect of this change is that A09.9 has replaced K52.9 as the unspecific code used to document the vast majority of routine cases in primary care. Failure to account for this would constitute a major error in medical or epidemiological research.
The function is_icd_code
tests whether a character vector represents a valid ICD-10-GM code (i.e. a code listed in the data.frame icd_meta_codes
, allowing for alternative code specifications). The test may be limited to a particular version of the ICD-10-GM by specifying the year
argument.
The function is_icd_code
recognises ICD codes regardless of their formatting, returning TRUE
if the string is recognised as an ICD code and FALSE
otherwise:
is_icd_code(c("E10.1", "E101", "E10.1-", "J44", "This is not an ICD code"))
#> [1] TRUE TRUE TRUE TRUE FALSE
The function icd_parse
extracts all ICD-10 codes from an arbitrary character vector. On the one hand, this may be used as in the icd_expand
function to convert ICD-10 codes to a standardised format or extract parts of the code. On the other hand, it may be used to extract potentially many ICD-10 codes from any document that can be converted to text format (perhaps using the pdftools
package to scrape a PDF document or rvest
to scrape a website).
As an example of how ICD10gm can be used to extract ICD codes from arbitrary text, the following code uses the rvest package to scrape the code block “A00-A09” from the online version of the DIMDI ICD-10-GM reference. We apply the filter to exclude codes below A10, thus revealing which other ICD-10 codes are reference from this block. To simply the package building process, the code has not been evaluated. This is left as an exercise to the reader.
library(dplyr)
library(rvest)
read_html("https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm/kode-suche/htmlgm2018/block-a00-a09.htm") %>%
html_text() %>%
icd_parse(type = "bounded") %>%
select(-icd_spec) %>%
unique() %>%
filter(icd_sub >= "A10") %>%
arrange(icd_sub) %>%
left_join(
get_icd_labels(year = 2018)[, c("icd_sub", "icd_normcode", "label")],
by = "icd_sub") %>%
select(icd_normcode, label) %>%
::kable(row.names = FALSE,
knitrcaption = "Additional ICD-10 codes referred to in block A00-A09 (Intestional infectious diseases) of the ICD-10-GM (2018).")
The function icd_expand
takes a data.frame containing ICD codes and optional metadata as input. It returns a data.frame containing all ICD codes at or below the specified level of the hierarchy (e.g. the specification “E11” is expanded to include all three, four and five-digit codes beginning with E11). Expansion is done within a specified version of the ICD-10-GM (e.g. year 2018).
Irritable bowel syndrome is coded using either the three-digit code K58 (conceiving IBS as the somatic condition) or the code F45.32 (focussing on IBS as a psychosomatic condition). We can retrieve all subcodes in the year 2019 as follows:
<- data.frame(DIAG_GROUP = c("IBS", "IBS"), ICD_SPEC = c("K58", "F45.32")) %>%
icd_k58 icd_expand(col_icd = "ICD_SPEC", year = 2019, col_meta = "DIAG_GROUP")
::kable(icd_k58) knitr
icd_spec | DIAG_GROUP | year | icd3 | icd_code | icd_normcode | icd_sub | label |
---|---|---|---|---|---|---|---|
K58 | IBS | 2019 | K58 | K58.- | K58 | K58 | Reizdarmsyndrom |
K58 | IBS | 2019 | K58 | K58.1 | K58.1 | K581 | Reizdarmsyndrom, Diarrhoe-prädominant [RDS-D] |
K58 | IBS | 2019 | K58 | K58.2 | K58.2 | K582 | Reizdarmsyndrom, Obstipations-prädominant [RDS-O] |
K58 | IBS | 2019 | K58 | K58.3 | K58.3 | K583 | Reizdarmsyndrom mit wechselnden (gemischten) Stuhlgewohnheiten [RDS-M] |
K58 | IBS | 2019 | K58 | K58.8 | K58.8 | K588 | Sonstiges und nicht näher bezeichnetes Reizdarmsyndrom |
F4532 | IBS | 2019 | F45 | F45.32 | F45.32 | F4532 | Somatoforme autonome Funktionsstörung: Unteres Verdauungssystem |
Note that the data.frame
containing the specification should normally be stored as a separate metadata file (eg. csv or Excel format) to facilitate maintenance and sharing of the specification. The column DIAG_GROUP
is a label that can be allocated to one or multiple rows of the specification and is useful when aggregating diagnoses. This is similar to the concept of diagnosis groupers used, for example, in risk adjustment schemes (e.g. as operated by the German Federal Social Insurance Office). In this case, we may want to treat the two alternative codes as equivalent by allocating the label “IBS” to both. In this way, we overcome the common problem that, in practice, multiple codes are used to document the same underlying disease.
The function icd_history
takes the result of icd_expand
, specified for a particular year, and returns a data.frame containing all corresponding codes for the specified years (from 2003). To do this, it applies the ICD-10-GM transition tables to map codes between successive ICD-10-GM versions. Only automatic transitions are followed to ensure that the specification retains its meaning. Custom transitions, tailored to the needs of the project at hand, can be specified to yield a more complete history.
We historise the code K58, specified for the year 2019, backwards to obtain the corresponding codes for the years 2017 to 2019:
icd_history(icd_k58, years = 2017:2019) %>%
select(icd_spec, DIAG_GROUP, year, icd_code) %>%
arrange(year, icd_code)
#> # A tibble: 12 × 4
#> icd_spec DIAG_GROUP year icd_code
#> <chr> <chr> <int> <chr>
#> 1 F4532 IBS 2017 F45.32
#> 2 K58 IBS 2017 K58.0
#> 3 K58 IBS 2017 K58.9
#> 4 F4532 IBS 2018 F45.32
#> 5 K58 IBS 2018 K58.0
#> 6 K58 IBS 2018 K58.9
#> 7 F4532 IBS 2019 F45.32
#> 8 K58 IBS 2019 K58.-
#> 9 K58 IBS 2019 K58.1
#> 10 K58 IBS 2019 K58.2
#> 11 K58 IBS 2019 K58.3
#> 12 K58 IBS 2019 K58.8
Program code is released under the MIT license.
The underlying ICD-10-GM metadata is copyright of the German Instutite of Medical Documentation and Information (DIMDI). The source files are available free of charge from the DIMDI Download Centre. I believe that their use in this package is compatible with the copyright restrictions. In particular:
The distribution of an “added-value product” derived from the original DIMDI classification files is expressly permitted.
Distribution of the original classification files is forbidden. Consequently, this package distributes only the code required to process these files. Those wishing to compile the data from scratch must download the files from the DIMDI download centre and agree to the copyright restrictions.
The package does not modify the ICD-10-GM codes, texts or other metadata in any way other than to restructure the data into a convenient form.
The package does not contain any commercial advertising.
The source of the data is clearly stated, both here and in the main package documentation.
The ICD10gm package provides a convenient means of accessing and manipulating the German modification of the ICD-10 classification. It is designed for use in medical, epidemiological and health services research.
To the author’s knowledge, this package represents the only publicly available repository of pre-processed metadata for the ICD-10-GM. Indeed, a key contribution of the package is the compilation and processing of the metadata provided by DIMDI, which is designed more for the needs of operational use than for the purpose of longitudinal secondary data analysis.
Building on the metadata, the ICD10gm package provides various functions to facilitate the analysis of ICD-10 data. Possible uses include:
Calculating the administrative prevalence of diseases over time, accounting for changes to the ICD-10-GM classification.
The extraction of ICD-10 codes for the purposes of text mining.
The implementation of diagnoses grouping systems for the representation of patient morbidity.
citation(package = "ICD10gm")
#>
#> To cite package 'ICD10gm' in publications use:
#>
#> Ewan Donnachie (2021). ICD10gm: Metadata Processing for the German
#> Modification of the ICD-10 Coding System.
#> https://edonnachie.github.io/ICD10gm/,
#> https://doi.org/10.5281/zenodo.2542833.
#>
#> Ein BibTeX-Eintrag für LaTeX-Benutzer ist
#>
#> @Manual{,
#> title = {ICD10gm: Metadata Processing for the German Modification of the ICD-10 Coding System},
#> author = {Ewan Donnachie},
#> year = {2021},
#> note = {https://edonnachie.github.io/ICD10gm/, https://doi.org/10.5281/zenodo.2542833},
#> }