--- title: "Introduction to the ICD10gm Package" subtitle: "Motivation, Basic Usage and Examples" author: "Ewan Donnachie" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to the ICD10gm Package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction ICD10gm is an R Package for working with the German Modification of the International Statistical Classification of Diseases and Related Health Problems (ICD-10-GM). The ICD-10 classification is an international standard for the coding of health service data. It is used widely both to document morbidity in healthcare systems, usually in the context of remuneration claims, and to encode [mortality statistics](https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates). In Germany, the [German Instutite of Medical Documentation and Information (DIMDI)](https://www.dimdi.de) releases a German Modification (ICD-10-GM) of the classification that forms a compulsory part of all remuneration claims in the ambulatory and hospital sectors. Further information and historical context can be found in, for example, [Graubner (2007)](https://doi.org/10.1007/s00103-007-0283-x) or [Jetté _et al_ (2010)](https://www.jstor.org/stable/25767019). ## Aims This package was created to facilitate the analysis of data coded using the ICD-10-GM. In particular, it has the following aims: 1. Provide convenient access to the extended ICD-10-GM metadata 2. Identify and extract ICD-10 codes from character strings 3. Facilitate the specification of ICD codes for analysis, utilising the ICD hierarchy (e.g. given the specification "A0" return all subcodes in the range "A01" to "A09") 3. Enable the historization of ICD specifications when analysing longitudinal claims data, applying the automatic code transitions provided by DIMDI, identifying potentially problematic codes and enabling the specification of custom transitions ICD10gm is designed for use in the context of medical and health services research using routinely collected claims data. It is not suitable for use in operative coding as it does not include all relevant metadata (e.g. inclusion and exclusion notes and the detailed definitions of psychiatric diagnoses). The metadata provided in the ICD10gm package is not intended to replace the [official DIMDI documentation](https://www.dimdi.de/dynamic/de/klassifikationen/icd/icd-10-gm/), which should always be consulted when specifying ICD codes for analysis. The following presents an overview of the basic functionality provided by the ICD10gm package, illustrated by means of simple examples. To access this vignette in R, type: ```{r, eval = FALSE} vignette("icd10gm_intro", package = "ICD10gm") ``` # Basic Use ## Access ICD-10-GM metadata The ICD-10-GM metadata are provided by four data.frames that form the core of the ICD10gm package: - `icd_meta_chapters`: Specification of the ICD-10-GM chapters with label and number (Arabic and Roman numerals) - `icd_meta_blocks`: Specification of the ICD-10 blocks, each containing a sequence of related ICD-10 codes - `icd_meta_codes`: Extended metadata for all ICD-10-GM codes - `icd_meta_transitions`: The "crosswalk" table specifying old and new ICD-10-GM codes for successive annual versions. Documentation for the individual datasets can be accessed using the R help system by typing, for example, either `help("icd_meta_codes", package = "ICD10gm")` or simply `?icd_meta_codes`. While column names have been translated into English, the ICD-10-GM labels are in German with UTF-8 character encoding throughout. In addition to this tabular data, several utility functions are provided to perform common queries on the metadata. - __Interactive code lookup:__ `icd_lookup` (lookup ICD-10 code in console), `icd_browse` (lookup ICD-10-GM in the official Online documentation) and `icd_search` (search through the ICD-10-GM labels). - __Manual code history validation:__ `icd_showchanges` and `icd_showchanges_icd3` query `icd_meta_transitions` and return entries that are affected by changes in the ICD-10-GM versions. ### Example First, we load the ICD10gm package alongside some [tidyverse packages](https://www.tidyverse.org): ```{r, message = FALSE} library(dplyr) library(purrr) library(tidyr) library(ICD10gm) ``` By way of example, we examine the coding of unspecific gastroenteritis (i.e. without identification of a specific cause), a very common diagnosis in primary care. We can look up the appropriate code as follows: ```{r} icd_search("gastroenteritis", level = 3) ``` We see that A09 is used for infectious gastroenteritis, whereas K52 corresponds to non-infectious gastroenteritis. We are interested in A09, but might wish to read up on the details in the official documentation: ```{r, eval = FALSE} icd_browse("A09") ``` This will open the documentation in our system's default browser. Now, we check whether whether this code has been affected by code transitions in any revision since 2003: ```{r} icd_showchanges_icd3("A09") %>% knitr::kable(row.names = FALSE) ``` Diagnoses that, prior to 2009, were coded as K52.9 are now coded as A09.9. We can investigate exactly what changed by looking the relevant codes for the years 2009 and 2010: ```{r} get_icd_labels(icd3 = c("A09", "K52"), year = 2009:2010) %>% arrange(year, icd_sub) %>% filter(icd_sub %in% c("K529") | icd3 == "A09") %>% select(year, icd_normcode, label) %>% knitr::kable(row.names = FALSE) ``` Prior to 2010, A09 had been reserved for gastroenteritis of presumed infectious origin (German: _vermutlich infektiösen Ursprungs_), with unspecified gastroenteritis coded by K52.9. Since 2010, A09.9 codes any unspecified gastroenteritis, with K52.9 reserved for cases determined to be non-infectious. The effect of this change is that A09.9 has replaced K52.9 as the unspecific code used to document the vast majority of routine cases in primary care. Failure to account for this would constitute a major error in medical or epidemiological research. ## Test whether a string represents an ICD-10-GM code The function `is_icd_code` tests whether a character vector represents a valid ICD-10-GM code (i.e. a code listed in the data.frame `icd_meta_codes`, allowing for alternative code specifications). The test may be limited to a particular version of the ICD-10-GM by specifying the `year` argument. ### Examples The function `is_icd_code` recognises ICD codes regardless of their formatting, returning `TRUE` if the string is recognised as an ICD code and `FALSE` otherwise: ```{r} is_icd_code(c("E10.1", "E101", "E10.1-", "J44", "This is not an ICD code")) ``` ## Extracting ICD codes from a string The function `icd_parse` extracts all ICD-10 codes from an arbitrary character vector. On the one hand, this may be used as in the `icd_expand` function to convert ICD-10 codes to a standardised format or extract parts of the code. On the other hand, it may be used to extract potentially many ICD-10 codes from any document that can be converted to text format (perhaps using the [`pdftools` package](https://CRAN.R-project.org/package=pdftools) to scrape a PDF document or [`rvest`](https://CRAN.R-project.org/package=rvest) to scrape a website). ### Example: Scraping codes from a website As an example of how ICD10gm can be used to extract ICD codes from arbitrary text, the following code uses the rvest package to scrape the code block "A00-A09" from the online version of the DIMDI ICD-10-GM reference. We apply the filter to exclude codes below A10, thus revealing which other ICD-10 codes are reference from this block. To simply the package building process, the code has not been evaluated. This is left as an exercise to the reader. ```{r, message = FALSE, eval = FALSE} library(dplyr) library(rvest) read_html("https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm/kode-suche/htmlgm2018/block-a00-a09.htm") %>% html_text() %>% icd_parse(type = "bounded") %>% select(-icd_spec) %>% unique() %>% filter(icd_sub >= "A10") %>% arrange(icd_sub) %>% left_join( get_icd_labels(year = 2018)[, c("icd_sub", "icd_normcode", "label")], by = "icd_sub") %>% select(icd_normcode, label) %>% knitr::kable(row.names = FALSE, caption = "Additional ICD-10 codes referred to in block A00-A09 (Intestional infectious diseases) of the ICD-10-GM (2018).") ``` ## Expand a ICD specification down the hierarchy The function `icd_expand` takes a data.frame containing ICD codes and optional metadata as input. It returns a data.frame containing all ICD codes at or below the specified level of the hierarchy (e.g. the specification "E11" is expanded to include all three, four and five-digit codes beginning with E11). Expansion is done within a specified version of the ICD-10-GM (e.g. year 2018). ### Example Irritable bowel syndrome is coded using either the three-digit code K58 (conceiving IBS as the somatic condition) or the code F45.32 (focussing on IBS as a psychosomatic condition). We can retrieve all subcodes in the year 2019 as follows: ```{r} icd_k58 <- data.frame(DIAG_GROUP = c("IBS", "IBS"), ICD_SPEC = c("K58", "F45.32")) %>% icd_expand(col_icd = "ICD_SPEC", year = 2019, col_meta = "DIAG_GROUP") knitr::kable(icd_k58) ``` Note that the `data.frame` containing the specification should normally be stored as a separate metadata file (eg. csv or Excel format) to facilitate maintenance and sharing of the specification. The column `DIAG_GROUP` is a label that can be allocated to one or multiple rows of the specification and is useful when aggregating diagnoses. This is similar to the concept of diagnosis groupers used, for example, in risk adjustment schemes (e.g. as operated by the [German Federal Social Insurance Office](https://www.bundesamtsozialesicherung.de/de/themen/risikostrukturausgleich/ueberblick/)). In this case, we may want to treat the two alternative codes as equivalent by allocating the label "IBS" to both. In this way, we overcome the common problem that, in practice, multiple codes are used to document the same underlying disease. ## Historise an ICD specification The function `icd_history` takes the result of `icd_expand`, specified for a particular year, and returns a data.frame containing all corresponding codes for the specified years (from 2003). To do this, it applies the ICD-10-GM transition tables to map codes between successive ICD-10-GM versions. Only automatic transitions are followed to ensure that the specification retains its meaning. Custom transitions, tailored to the needs of the project at hand, can be specified to yield a more complete history. ## Example We historise the code K58, specified for the year 2019, backwards to obtain the corresponding codes for the years 2017 to 2019: ```{r} icd_history(icd_k58, years = 2017:2019) %>% select(icd_spec, DIAG_GROUP, year, icd_code) %>% arrange(year, icd_code) ``` # Copyright Program code is released under the [MIT license](https://edonnachie.github.io/ICD10gm/LICENSE-text.html). The underlying ICD-10-GM metadata is copyright of the [German Instutite of Medical Documentation and Information (DIMDI)](https://www.dimdi.de). The source files are available free of charge from the [DIMDI Download Centre](https://www.dimdi.de/dynamic/de/klassifikationen/downloads/?dir=icd-10-gm). I believe that their use in this package is compatible with the copyright restrictions. In particular: - The distribution of an "added-value product" derived from the original DIMDI classification files is expressly permitted. - Distribution of the original classification files is forbidden. Consequently, this package distributes only the code required to process these files. Those wishing to compile the data from scratch must download the files from the DIMDI download centre and agree to the copyright restrictions. - The package does not modify the ICD-10-GM codes, texts or other metadata in any way other than to restructure the data into a convenient form. - The package does not contain any commercial advertising. - The source of the data is clearly stated, both here and in the main package documentation. # Summary The ICD10gm package provides a convenient means of accessing and manipulating the German modification of the ICD-10 classification. It is designed for use in medical, epidemiological and health services research. To the author's knowledge, this package represents the only publicly available repository of pre-processed metadata for the ICD-10-GM. Indeed, a key contribution of the package is the compilation and processing of the metadata provided by DIMDI, which is designed more for the needs of operational use than for the purpose of longitudinal secondary data analysis. Building on the metadata, the ICD10gm package provides various functions to facilitate the analysis of ICD-10 data. Possible uses include: - Calculating the administrative prevalence of diseases over time, accounting for changes to the ICD-10-GM classification. - The extraction of ICD-10 codes for the purposes of text mining. - The implementation of diagnoses grouping systems for the representation of patient morbidity. ## Related Packages The [icd package](https://CRAN.R-project.org/package=icd) has similar aims but has a slightly different approach and focusses on the US version of the ICD-10 system. ## Cite ```{r, warning=FALSE} citation(package = "ICD10gm") ```