The add_decimal
function is used to reintroduce decimal
points to ICD codes. This functionality is particularly useful for
standardizing ICD code formats, especially in datasets where decimal
points have been removed. This removal can lead to inconsistencies in
code formatting and hinder the accurate alignment of codes with
standardized comorbidity indices.
# Example ICD code dataframe
df <- data.frame(
id = c(1, 2, 3),
icd_1 = c("C509", "D633", "I210"),
icd_2 = c("D509", "E788", "N183")
)
# Adding decimal to the ICD codes
formatted_df <- add_decimal(df, icd_cols = c("icd_1", "icd_2"))
# Displaying the updated dataframe
print(formatted_df)
#> id icd_1 icd_2
#> 1 1 C50.9 D50.9
#> 2 2 D63.3 E78.8
#> 3 3 I21.0 N18.3
The long_to_wide
function reshapes data from a long
format (multiple rows per patient) to a wide format (one row per patient
with multiple columns for diagnoses). This function supports batch
processing to handle large datasets efficiently. By specifying the
batch_size parameter, you can control the number of rows processed in
each batch. This tranformation is required before applying
icd_to_comorbid
functions.
# Example long format data with multiple rows per patient
long_data <- data.frame(
patient_id = c(1, 1, 2, 2, 3),
icd_1 = c("A01", "A02", "B01", "B02", "C01"),
icd_2 = c("D01", "E02", "F01", "G02", "H01")
)
# Reshaping the data to wide format
wide_data <- long_to_wide(long_data, idx = "patient_id", icd_cols = c("icd_1", "icd_2"))
# Displaying the reshaped data
print(wide_data)
#> patient_id icd_1 icd_2 icd_3 icd_4 icd_5
#> 1 1 A01 D01 A02 E02 NA
#> 2 2 B01 F01 B02 G02 NA
#> 3 3 C01 H01 <NA> <NA> NA
The icdcomorbid
R package includes functions to map
ICD-9 and ICD-10 codes to standard comorbidity indices. Additionally,
users can choose between the Charlson or Quan-Elixhauser comorbidity
indices for their analysis. Depending on your data, you can choose the
appropriate ICD version and comorbidity index for accurate comorbidity
calculations. Batch processing is also supported by specifying the
batch_size parameter. Your data should be formatted correctly (i.e., in
wide format) before applying these functions.
You can choose between the Charlson or Quan-Elixhauser comorbidity indices for both ICD-9 and ICD-10 codes. Note that different mappings are required for ICD-9 and ICD-10 codes.
ICD-9 Codes: Use the icd9_to_comorbid function and select the appropriate index such as “charlson9” or “elixhauser9”.
ICD-10 Codes: Use the icd10_to_comorbid function and select the corresponding index such as “charlson10” or “elixhauser10”.
If your dataset contains ICD-9 codes, you can use the
icd9_to_comorbid
function to calculate comorbidities.
# Example ICD-9 data
icd9_data <- data.frame(
patient_id = c(1, 1, 2, 2, 3),
icd9_code = c("4010", "2500", "4140", "4280", "4930")
)
# Map ICD-9 codes to comorbidities using Charlson index
mapping <- "charlson9"
comorbidities_icd9 <- icd9_to_comorbid(
df = icd9_data,
idx = "patient_id",
icd_cols = "icd9_code",
mapping = mapping,
batch_size = 2
)
# Display the comorbidity results
head(comorbidities_icd9)
#> patient_id myocardial_infarction congestive_heart_failure
#> 1 1 FALSE FALSE
#> 2 1 FALSE FALSE
#> 3 2 FALSE FALSE
#> 4 2 FALSE TRUE
#> 5 3 FALSE FALSE
#> peripheral_vascular_disease cerebrovascular_disease dementia
#> 1 FALSE FALSE FALSE
#> 2 FALSE FALSE FALSE
#> 3 FALSE FALSE FALSE
#> 4 FALSE FALSE FALSE
#> 5 FALSE FALSE FALSE
#> chronic_pulmonary_disease connective_tissue_disease_rheumatic_disease
#> 1 FALSE FALSE
#> 2 FALSE FALSE
#> 3 FALSE FALSE
#> 4 FALSE FALSE
#> 5 TRUE FALSE
#> mild_liver_disease diabetes_wo_complications diabetes_w_complications
#> 1 FALSE FALSE FALSE
#> 2 FALSE FALSE FALSE
#> 3 FALSE FALSE FALSE
#> 4 FALSE FALSE FALSE
#> 5 FALSE FALSE FALSE
#> paraplegia_and_hemiplegia renal_disease cancer
#> 1 FALSE FALSE FALSE
#> 2 FALSE FALSE FALSE
#> 3 FALSE FALSE FALSE
#> 4 FALSE FALSE FALSE
#> 5 FALSE FALSE FALSE
#> moderate_or_severe_liver_disease metastatic_carcinoma aids_hiv
#> 1 FALSE FALSE FALSE
#> 2 FALSE FALSE FALSE
#> 3 FALSE FALSE FALSE
#> 4 FALSE FALSE FALSE
#> 5 FALSE FALSE FALSE
If your dataset contains ICD-10 codes, you can use the
icd10_to_comorbid
function to calculate comorbidities:
# Example data with ICD-10 codes
icd10_data <- data.frame(
patient_id = c(1, 1, 2, 2, 3),
icd_code = c("E11", "I10", "E11", "I50", "I21")
)
mapping <- "quan_elixhauser10"
# Calculate comorbidities for ICD-10 data using Elixhauser index
icd10_comorbidities <- icd10_to_comorbid(
df = icd10_data,
idx = "patient_id",
icd_cols = "icd_code",
mapping = mapping,
batch_size = 2
)
# Display the comorbidity results
head(icd10_comorbidities)
#> patient_id congestive_heart_failure cardiac_arrhythmia valvular_disease
#> 1 1 FALSE FALSE FALSE
#> 2 1 FALSE FALSE FALSE
#> 3 2 FALSE FALSE FALSE
#> 4 2 TRUE FALSE FALSE
#> 5 3 FALSE FALSE FALSE
#> pulmonary_circulation_disorder peripheral_vascular_disorder
#> 1 FALSE FALSE
#> 2 FALSE FALSE
#> 3 FALSE FALSE
#> 4 FALSE FALSE
#> 5 FALSE FALSE
#> hypertension_uncomplicated hypertension_complicated paralysis
#> 1 FALSE FALSE FALSE
#> 2 TRUE FALSE FALSE
#> 3 FALSE FALSE FALSE
#> 4 FALSE FALSE FALSE
#> 5 FALSE FALSE FALSE
#> other_neurological_disorder chronic_pulmonary_disease diabetes_uncomplicated
#> 1 FALSE FALSE FALSE
#> 2 FALSE FALSE FALSE
#> 3 FALSE FALSE FALSE
#> 4 FALSE FALSE FALSE
#> 5 FALSE FALSE FALSE
#> diabetes_complicated hypothyroidism renal_failure liver_disease
#> 1 FALSE FALSE FALSE FALSE
#> 2 FALSE FALSE FALSE FALSE
#> 3 FALSE FALSE FALSE FALSE
#> 4 FALSE FALSE FALSE FALSE
#> 5 FALSE FALSE FALSE FALSE
#> peptic_ulcer_disease_excluding_bleeding aids_hiv lymphoma metastatic_cancer
#> 1 FALSE FALSE FALSE FALSE
#> 2 FALSE FALSE FALSE FALSE
#> 3 FALSE FALSE FALSE FALSE
#> 4 FALSE FALSE FALSE FALSE
#> 5 FALSE FALSE FALSE FALSE
#> solid_tumor_wo_metastasis rheumatoid_arhritis coagulopathy obesity
#> 1 FALSE FALSE FALSE FALSE
#> 2 FALSE FALSE FALSE FALSE
#> 3 FALSE FALSE FALSE FALSE
#> 4 FALSE FALSE FALSE FALSE
#> 5 FALSE FALSE FALSE FALSE
#> weight_loss fluid_and_electrolyte_disorders blood_loss_anemia
#> 1 FALSE FALSE FALSE
#> 2 FALSE FALSE FALSE
#> 3 FALSE FALSE FALSE
#> 4 FALSE FALSE FALSE
#> 5 FALSE FALSE FALSE
#> deficiency_anemia alcohol_abuse drug_abuse psychoses depression
#> 1 FALSE FALSE FALSE FALSE FALSE
#> 2 FALSE FALSE FALSE FALSE FALSE
#> 3 FALSE FALSE FALSE FALSE FALSE
#> 4 FALSE FALSE FALSE FALSE FALSE
#> 5 FALSE FALSE FALSE FALSE FALSE
# Custom mapping
custom_mapping <- list(
"Hypertension" = c("4010", "4011", "4019"),
"Diabetes" = c("2500", "2501", "2502")
)
# Map ICD-9 codes to comorbidities using custom mapping
comorbidities_custom <- icd9_to_comorbid(
df = icd9_data,
idx = "patient_id",
icd_cols = "icd9_code",
mapping = custom_mapping,
batch_size = 2
)
# Display the comorbidity results
head(comorbidities_custom)
#> patient_id Hypertension Diabetes
#> 1 1 TRUE FALSE
#> 2 1 FALSE TRUE
#> 3 2 FALSE FALSE
#> 4 2 FALSE FALSE
#> 5 3 FALSE FALSE
The episode_of_care
function groups patients into
episodes of care, which is useful for analyzing patient treatment over
time.
# Example data with admit and discharge dates for DAD and NACRS
dad_data <- data.frame(
patient_id = c(1, 1, 2),
dad_admit = as.POSIXct(c("2023-01-01 10:00:00", "2023-02-01 09:00:00",
"2023-01-15 08:00:00"), tz="UTC"),
dad_dis = as.POSIXct(c("2023-01-10 15:00:00", "2023-02-10 14:00:00",
"2023-01-20 12:00:00"), tz="UTC")
)
nacrs_data <- data.frame(
patient_id = c(1, 2, 2),
nacrs_admit = as.POSIXct(c("2023-01-15 10:00:00", "2023-01-25 09:00:00",
"2023-03-01 08:00:00"), tz="UTC"),
nacrs_dis = as.POSIXct(c("2023-01-20 15:00:00", "2023-01-30 14:00:00",
"2023-03-05 12:00:00"), tz="UTC")
)
# Creating episodes of care
episodes <- episode_of_care(dad_data, nacrs_data, patient_id_col = "patient_id",
dad_visit_date_col = "dad_admit",
dad_exit_date_col = "dad_dis",
nacrs_visit_date_col = "nacrs_admit",
nacrs_exit_date_col = "nacrs_dis")
head(episodes)
#> record_id patient_id dad_admit dad_dis
#> 1 1 1 2023-01-01 10:00:00 2023-01-10 15:00:00
#> 2 2 1 <NA> <NA>
#> 3 3 1 2023-02-01 09:00:00 2023-02-10 14:00:00
#> 4 4 2 2023-01-15 08:00:00 2023-01-20 12:00:00
#> 5 5 2 <NA> <NA>
#> 6 6 2 <NA> <NA>
#> nacrs_admit nacrs_dis source episode_of_care
#> 1 <NA> <NA> DAD 1
#> 2 2023-01-15 10:00:00 2023-01-20 15:00:00 NACRS 2
#> 3 <NA> <NA> DAD 3
#> 4 <NA> <NA> DAD 1
#> 5 2023-01-25 09:00:00 2023-01-30 14:00:00 NACRS 2
#> 6 2023-03-01 08:00:00 2023-03-05 12:00:00 NACRS 3