Introduction to the ICD10gm Package

Motivation, Basic Usage and Examples

Ewan Donnachie

2023-02-25

Introduction

ICD10gm is an R Package for working with the German Modification of the International Statistical Classification of Diseases and Related Health Problems (ICD-10-GM).

The ICD-10 classification is an international standard for the coding of health service data. It is used widely both to document morbidity in healthcare systems, usually in the context of remuneration claims, and to encode mortality statistics. In Germany, the German Instutite of Medical Documentation and Information (DIMDI) releases a German Modification (ICD-10-GM) of the classification that forms a compulsory part of all remuneration claims in the ambulatory and hospital sectors. Further information and historical context can be found in, for example, Graubner (2007) or Jetté et al (2010).

Aims

This package was created to facilitate the analysis of data coded using the ICD-10-GM. In particular, it has the following aims:

  1. Provide convenient access to the extended ICD-10-GM metadata
  2. Identify and extract ICD-10 codes from character strings
  3. Facilitate the specification of ICD codes for analysis, utilising the ICD hierarchy (e.g. given the specification “A0” return all subcodes in the range “A01” to “A09”)
  4. Enable the historization of ICD specifications when analysing longitudinal claims data, applying the automatic code transitions provided by DIMDI, identifying potentially problematic codes and enabling the specification of custom transitions

ICD10gm is designed for use in the context of medical and health services research using routinely collected claims data. It is not suitable for use in operative coding as it does not include all relevant metadata (e.g. inclusion and exclusion notes and the detailed definitions of psychiatric diagnoses). The metadata provided in the ICD10gm package is not intended to replace the official DIMDI documentation, which should always be consulted when specifying ICD codes for analysis.

The following presents an overview of the basic functionality provided by the ICD10gm package, illustrated by means of simple examples. To access this vignette in R, type:

vignette("icd10gm_intro", package = "ICD10gm")

Basic Use

Access ICD-10-GM metadata

The ICD-10-GM metadata are provided by four data.frames that form the core of the ICD10gm package:

Documentation for the individual datasets can be accessed using the R help system by typing, for example, either help("icd_meta_codes", package = "ICD10gm") or simply ?icd_meta_codes.

While column names have been translated into English, the ICD-10-GM labels are in German with UTF-8 character encoding throughout.

In addition to this tabular data, several utility functions are provided to perform common queries on the metadata.

Example

First, we load the ICD10gm package alongside some tidyverse packages:

library(dplyr)
library(purrr)
library(tidyr)
library(ICD10gm)

By way of example, we examine the coding of unspecific gastroenteritis (i.e. without identification of a specific cause), a very common diagnosis in primary care. We can look up the appropriate code as follows:

icd_search("gastroenteritis", level = 3)
#>        year icd3 icd_sub
#> 299205 2023  A09     A09
#> 304723 2023  K52     K52
#>                                                                                                                      label
#> 299205 Sonstige und nicht näher bezeichnete Gastroenteritis und Kolitis infektiösen und nicht näher bezeichneten Ursprungs
#> 304723                                                                Sonstige nichtinfektiöse Gastroenteritis und Kolitis

We see that A09 is used for infectious gastroenteritis, whereas K52 corresponds to non-infectious gastroenteritis. We are interested in A09, but might wish to read up on the details in the official documentation:

icd_browse("A09")

This will open the documentation in our system’s default browser.

Now, we check whether whether this code has been affected by code transitions in any revision since 2003:

icd_showchanges_icd3("A09") %>%
  knitr::kable(row.names = FALSE)
year_from year_to icd_from icd_to automatic_forward automatic_backward change_5 change_4 change_3 change icd3 icd_kapitel
2009 2010 A09 A09.0 A TRUE FALSE FALSE TRUE A09 A
2009 2010 A09 A09.9 TRUE FALSE FALSE TRUE A09 A
2009 2010 K52.9 A09.9 FALSE FALSE TRUE TRUE A09 A

Diagnoses that, prior to 2009, were coded as K52.9 are now coded as A09.9. We can investigate exactly what changed by looking the relevant codes for the years 2009 and 2010:

get_icd_labels(icd3 = c("A09", "K52"), year = 2009:2010) %>%
  arrange(year, icd_sub) %>% 
  filter(icd_sub %in% c("K529") | icd3 == "A09") %>% 
  select(year, icd_normcode, label) %>% 
  knitr::kable(row.names = FALSE)
year icd_normcode label
2009 A09 Diarrhoe und Gastroenteritis, vermutlich infektiösen Ursprungs
2009 K52.9 Nichtinfektiöse Gastroenteritis und Kolitis, nicht näher bezeichnet
2010 A09 Sonstige und nicht näher bezeichnete Gastroenteritis und Kolitis infektiösen und nicht näher bezeichneten Ursprungs
2010 A09.0 Sonstige und nicht näher bezeichnete Gastroenteritis und Kolitis infektiösen Ursprungs
2010 A09.9 Sonstige und nicht näher bezeichnete Gastroenteritis und Kolitis nicht näher bezeichneten Ursprungs
2010 K52.9 Nichtinfektiöse Gastroenteritis und Kolitis, nicht näher bezeichnet

Prior to 2010, A09 had been reserved for gastroenteritis of presumed infectious origin (German: vermutlich infektiösen Ursprungs), with unspecified gastroenteritis coded by K52.9. Since 2010, A09.9 codes any unspecified gastroenteritis, with K52.9 reserved for cases determined to be non-infectious. The effect of this change is that A09.9 has replaced K52.9 as the unspecific code used to document the vast majority of routine cases in primary care. Failure to account for this would constitute a major error in medical or epidemiological research.

Test whether a string represents an ICD-10-GM code

The function is_icd_code tests whether a character vector represents a valid ICD-10-GM code (i.e. a code listed in the data.frame icd_meta_codes, allowing for alternative code specifications). The test may be limited to a particular version of the ICD-10-GM by specifying the year argument.

Examples

The function is_icd_code recognises ICD codes regardless of their formatting, returning TRUE if the string is recognised as an ICD code and FALSE otherwise:

is_icd_code(c("E10.1", "E101", "E10.1-", "J44", "This is not an ICD code"))
#> [1]  TRUE  TRUE  TRUE  TRUE FALSE

Extracting ICD codes from a string

The function icd_parse extracts all ICD-10 codes from an arbitrary character vector. On the one hand, this may be used as in the icd_expand function to convert ICD-10 codes to a standardised format or extract parts of the code. On the other hand, it may be used to extract potentially many ICD-10 codes from any document that can be converted to text format (perhaps using the pdftools package to scrape a PDF document or rvest to scrape a website).

Example: Scraping codes from a website

As an example of how ICD10gm can be used to extract ICD codes from arbitrary text, the following code uses the rvest package to scrape the code block “A00-A09” from the online version of the DIMDI ICD-10-GM reference. We apply the filter to exclude codes below A10, thus revealing which other ICD-10 codes are reference from this block. To simply the package building process, the code has not been evaluated. This is left as an exercise to the reader.

library(dplyr)
library(rvest)

read_html("https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm/kode-suche/htmlgm2018/block-a00-a09.htm") %>% 
  html_text() %>%
  icd_parse(type = "bounded") %>%
  select(-icd_spec) %>% 
  unique() %>% 
  filter(icd_sub >= "A10") %>% 
  arrange(icd_sub) %>% 
  left_join(
    get_icd_labels(year = 2018)[, c("icd_sub", "icd_normcode", "label")],
    by = "icd_sub") %>% 
  select(icd_normcode, label) %>% 
  knitr::kable(row.names = FALSE,
               caption = "Additional ICD-10 codes referred to in block A00-A09 (Intestional infectious diseases) of the ICD-10-GM (2018).")

Expand a ICD specification down the hierarchy

The function icd_expand takes a data.frame containing ICD codes and optional metadata as input. It returns a data.frame containing all ICD codes at or below the specified level of the hierarchy (e.g. the specification “E11” is expanded to include all three, four and five-digit codes beginning with E11). Expansion is done within a specified version of the ICD-10-GM (e.g. year 2018).

Example

Irritable bowel syndrome is coded using either the three-digit code K58 (conceiving IBS as the somatic condition) or the code F45.32 (focussing on IBS as a psychosomatic condition). We can retrieve all subcodes in the year 2019 as follows:

icd_k58 <- data.frame(DIAG_GROUP = c("IBS", "IBS"), ICD_SPEC = c("K58", "F45.32")) %>% 
  icd_expand(col_icd = "ICD_SPEC", year = 2019, col_meta = "DIAG_GROUP")
  
knitr::kable(icd_k58)
icd_spec DIAG_GROUP year icd3 icd_code icd_normcode icd_sub label
K58 IBS 2019 K58 K58.- K58 K58 Reizdarmsyndrom
K58 IBS 2019 K58 K58.1 K58.1 K581 Reizdarmsyndrom, Diarrhoe-prädominant [RDS-D]
K58 IBS 2019 K58 K58.2 K58.2 K582 Reizdarmsyndrom, Obstipations-prädominant [RDS-O]
K58 IBS 2019 K58 K58.3 K58.3 K583 Reizdarmsyndrom mit wechselnden (gemischten) Stuhlgewohnheiten [RDS-M]
K58 IBS 2019 K58 K58.8 K58.8 K588 Sonstiges und nicht näher bezeichnetes Reizdarmsyndrom
F4532 IBS 2019 F45 F45.32 F45.32 F4532 Somatoforme autonome Funktionsstörung: Unteres Verdauungssystem

Note that the data.frame containing the specification should normally be stored as a separate metadata file (eg. csv or Excel format) to facilitate maintenance and sharing of the specification. The column DIAG_GROUP is a label that can be allocated to one or multiple rows of the specification and is useful when aggregating diagnoses. This is similar to the concept of diagnosis groupers used, for example, in risk adjustment schemes (e.g. as operated by the German Federal Social Insurance Office). In this case, we may want to treat the two alternative codes as equivalent by allocating the label “IBS” to both. In this way, we overcome the common problem that, in practice, multiple codes are used to document the same underlying disease.

Historise an ICD specification

The function icd_history takes the result of icd_expand, specified for a particular year, and returns a data.frame containing all corresponding codes for the specified years (from 2003). To do this, it applies the ICD-10-GM transition tables to map codes between successive ICD-10-GM versions. Only automatic transitions are followed to ensure that the specification retains its meaning. Custom transitions, tailored to the needs of the project at hand, can be specified to yield a more complete history.

Example

We historise the code K58, specified for the year 2019, backwards to obtain the corresponding codes for the years 2017 to 2019:

icd_history(icd_k58, years = 2017:2019) %>% 
  select(icd_spec, DIAG_GROUP, year, icd_code) %>% 
  arrange(year, icd_code)
#> # A tibble: 12 × 4
#>    icd_spec DIAG_GROUP  year icd_code
#>    <chr>    <chr>      <int> <chr>   
#>  1 F4532    IBS         2017 F45.32  
#>  2 K58      IBS         2017 K58.0   
#>  3 K58      IBS         2017 K58.9   
#>  4 F4532    IBS         2018 F45.32  
#>  5 K58      IBS         2018 K58.0   
#>  6 K58      IBS         2018 K58.9   
#>  7 F4532    IBS         2019 F45.32  
#>  8 K58      IBS         2019 K58.-   
#>  9 K58      IBS         2019 K58.1   
#> 10 K58      IBS         2019 K58.2   
#> 11 K58      IBS         2019 K58.3   
#> 12 K58      IBS         2019 K58.8

Summary

The ICD10gm package provides a convenient means of accessing and manipulating the German modification of the ICD-10 classification. It is designed for use in medical, epidemiological and health services research.

To the author’s knowledge, this package represents the only publicly available repository of pre-processed metadata for the ICD-10-GM. Indeed, a key contribution of the package is the compilation and processing of the metadata provided by DIMDI, which is designed more for the needs of operational use than for the purpose of longitudinal secondary data analysis.

Building on the metadata, the ICD10gm package provides various functions to facilitate the analysis of ICD-10 data. Possible uses include:

Cite

citation(package = "ICD10gm")
#> 
#> Um Paket 'ICD10gm' in Publikationen zu zitieren, nutzen Sie bitte:
#> 
#>   Donnachie E (2023). _ICD10gm: Metadata Processing for the German
#>   Modification of the ICD-10 Coding System_.
#>   https://edonnachie.github.io/ICD10gm/,
#>   https://doi.org/10.5281/zenodo.2542833.
#> 
#> Ein BibTeX-Eintrag für LaTeX-Benutzer ist
#> 
#>   @Manual{,
#>     title = {ICD10gm: Metadata Processing for the German Modification of the ICD-10 Coding System},
#>     author = {Ewan Donnachie},
#>     year = {2023},
#>     note = {https://edonnachie.github.io/ICD10gm/, https://doi.org/10.5281/zenodo.2542833},
#>   }