--- title: "Getting started with cancerR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Get started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` # Getting started This vignette will show you how to use the `cancerR` package to classify cancer subtypes using the information available from pathology reports which are typically coded using the International Classification of Diseases for Oncology (ICD-O) system. This information is typically available in cancer registries and can be used to classify the type of cancer. ```{r setup} library(cancerR) # Make example data data <- data.frame( icd_o3_histology = c("8522", "9490", "9070"), # Different formats of site codes commonly found in cancer registries icd_o3_site = c("C50.1", "C701", "620"), icd_o3_behaviour = c("3", "3", "3") ) head(data) ``` # Convert cancer site The `site_convert()` function can be used to extract the correct site (*a.k.a.* topography) codes and convert them to a standardized numeric format. It is designed to handle both character and numeric input and will automatically detect if the codes are in decimal ("C34.1") or integer ("C341") format and convert them. ```{r} # Convert site codes data$site_conv <- site_convert(data$icd_o3_site, validate = FALSE) head(data) ``` `site_convert()` also has built-in validation to ensure that the site codes have the correct numeric values ranging from "C00.0" to "C97.9". This can be called by specifying the `validate` argument as `TRUE`. ```{r} # Valid site codes site_convert("C34.1", validate = TRUE) # Invalid site codes site_convert("C99.9", validate = TRUE) # Should return NA and an warning message site_convert("C99.9", validate = FALSE) # Should return 999 ``` # Classify adolescent and young adult cancers The `aya_class()` function can be used to classify adolescent and young adult cancer based on the histology, site, and behaviour codes of the cancer. The method used for the classification can be specified using one of the `method` arguments specified below: - `"Barr 2020"` (**default**) - Classification based on the AYA classification by [Barr et al](https://doi.org/10.1002/cncr.33041) - `"SEER 2020"` - [S.E.E.R. 2020 Recode Revision](https://seer.cancer.gov/ayarecode/aya-2020.html) - `"SEER-WHO v2008"` - [S.E.E.R. WHO 2008](https://seer.cancer.gov/ayarecode/aya-who2008.html) - `"SEER v2006"` - [S.E.E.R. 2006](https://seer.cancer.gov/ayarecode/ayarecode-orig.html) Users can also specify the depth of the classification tree using the `depth` argument. The depth parameter specifies the maximum depth of the classification tree, with 1 being the highest level of classification and most general grouping. ```{r} # Classify AYA cancers using Barr 2020 classification # Classify at level 1 (most general) data$dx_lvl_1 <- aya_class(data$icd_o3_histology, data$icd_o3_site, data$icd_o3_behaviour, depth = 1) # Add more granular classifications data$dx_lvl_2 <- aya_class( histology = data$icd_o3_histology, site = data$site_conv, behaviour = data$icd_o3_behaviour, depth = 2 ) # Add even more granular classifications (level 3) using SEER 2020 revision classification data$dx_lvl_3 <- aya_class( histology = data$icd_o3_histology, site = site_convert(data$icd_o3_site), # Convert site codes using site_convert() behaviour = data$icd_o3_behaviour, method = "SEER v2020", depth = 3 ) # View created columns print(data[, c("dx_lvl_1", "dx_lvl_2", "dx_lvl_3")]) ``` # Classify childhood cancers Similarly, the `kid_class()` function can be used to classify childhood cancers. The method used for the classification can be specified using one of the `method` arguments specified below: - `"iccc3"` (**default**) - Classification based on the [International Classification of Childhood Cancer, 3rd ed. (ICCC-3)](https://doi.org/10.1002/cncr.20910) - `"who-iccc3"` - [ICCC-3 Recode ICD-O-3/WHO 2008](https://seer.cancer.gov/iccc/iccc-who2008.html) - `"iarc2017"` - [ICCC-3 / IARC2017](https://seer.cancer.gov/iccc/iccc-iarc-2017.html) ```{r} # Make example data data_kid <- data.frame( histology = c("8522", "9490", "9070"), site = c("C50.1", "C701", "620"), behaviour = c("3", "3", "3") ) # Classify childhood cancers using ICCC-3 classification data_kid$dx_lvl_1 <- kid_class(data_kid$histology, data_kid$site, depth = 1) # ICCC-3 data_kid$dx_lvl_1.seer <- kid_class(data_kid$histology, data_kid$site, method = "who-iccc3", depth = 1) # WHO-SEER recode # Add SEER grouping column data_kid$seer_grp <- kid_class(data_kid$histology, data_kid$site, depth = 99) # View results head(data_kid) ```