Modular Approach

Introduction

This article describes how to cut study SDTM data using a modular approach to enable any further study or project specific customization.

Programming Flow

Read in Data

To start, all SDTM data to be cut needs to be stored in a list.

library(datacutr)
library(admiraldev)
library(dplyr)
library(lubridate)
library(stringr)
library(purrr)
library(rlang)

source_data <- list(ds = datacutr_ds, dm = datacutr_dm, ae = datacutr_ae, sc = datacutr_sc, lb = datacutr_lb, fa = datacutr_fa, ts = datacutr_ts)

Create DCUT Dataset

The next step is to create the DCUT dataset containing the datacut date and description.

dcut <- create_dcut(
  dataset_ds = source_data$ds,
  ds_date_var = DSSTDTC,
  filter = DSDECOD == "RANDOMIZATION",
  cut_date = "2022-06-04",
  cut_description = "Clinical Cutoff Date"
)
USUBJID DCUTDTC DCUTDTM DCUTDESC
AB12345-001 2022-06-04 2022-06-04 23:59:59 Clinical Cutoff Date
AB12345-002 2022-06-04 2022-06-04 23:59:59 Clinical Cutoff Date
AB12345-003 2022-06-04 2022-06-04 23:59:59 Clinical Cutoff Date
AB12345-004 2022-06-04 2022-06-04 23:59:59 Clinical Cutoff Date

Preprocess Datasets

If any pre-processing of datasets is needed, for example in the case of FA, where there are multiple date variables, this should be done next.

source_data$fa <- source_data$fa %>%
  mutate(DCUT_TEMP_FAXDTC = case_when(
    FASTDTC != "" ~ FASTDTC,
    FADTC != "" ~ FADTC,
    TRUE ~ as.character(NA)
  ))
USUBJID FASTDTC FADTC DCUT_TEMP_FAXDTC
AB12345-001 2022-06-01 2022-06-01
AB12345-002 2022-06-30 2022-06-30
AB12345-003 2022-07-01 2022-07-01
AB12345-004 2022-05-04 2022-05-04
AB12345-005 2022-12-01 2022-12-01

Specify Cut Types

We’ll next specify the cut types for each dataset (patient cut, date cut or no cut) and in the case of date cut which date variable should be used.

patient_cut_list <- c("sc", "ds")

date_cut_list <- rbind(
  c("ae", "AESTDTC"),
  c("lb", "LBDTC"),
  c("fa", "DCUT_TEMP_FAXDTC")
)

no_cut_list <- list(ts = source_data$ts)

Patient Cut

Next we’ll apply the patient cut.

patient_cut_data <- lapply(
  source_data[patient_cut_list], pt_cut,
  dataset_cut = dcut
)

This adds on temporary flag variables indicating which observations will be removed, for example for SC:

USUBJID SCORRES DCUT_TEMP_REMOVE
AB12345-001 A NA
AB12345-002 B NA
AB12345-003 C NA
AB12345-004 D NA
AB12345-005 E Y

Date Cut

Next we’ll apply the date cut.

date_cut_data <- pmap(
  .l = list(
    dataset_sdtm = source_data[date_cut_list[, 1]],
    sdtm_date_var = syms(date_cut_list[, 2])
  ),
  .f = date_cut,
  dataset_cut = dcut,
  cut_var = DCUTDTM
)

This again adds on temporary flag variables indicating which observations will be removed, for example for AE:

USUBJID AETERM AESTDTC DCUT_TEMP_SDTM_DATE DCUT_TEMP_DCUTDTM DCUT_TEMP_REMOVE
AB12345-001 AE1 2022-06-01 2022-06-01 2022-06-04 23:59:59 NA
AB12345-002 AE2 2022-06-30 2022-06-30 2022-06-04 23:59:59 Y
AB12345-003 AE3 2022-07-01 2022-07-01 2022-06-04 23:59:59 Y
AB12345-004 AE4 2022-05-04 2022-05-04 2022-06-04 23:59:59 NA
AB12345-005 AE5 2022-12-01 2022-12-01 NA Y

DM Cut

Then lastly we’ll apply the special DM cut which also updates the death related variables.

dm_cut <- special_dm_cut(
  dataset_dm = source_data$dm,
  dataset_cut = dcut,
  cut_var = DCUTDTM
)

This adds on temporary variables indicating any death records that would change as a result of applying a datacut:

USUBJID DTHFL DTHDTC DCUT_TEMP_REMOVE DCUT_TEMP_DTHDT DCUT_TEMP_DCUTDTM DCUT_TEMP_DTHCHANGE
AB12345-001 Y 2022-06-01 NA 2022-06-01 2022-06-04 23:59:59 NA
AB12345-002 NA NA 2022-06-04 23:59:59 NA
AB12345-003 Y 2022-07-01 NA 2022-07-01 2022-06-04 23:59:59 Y
AB12345-004 NA NA 2022-06-04 23:59:59 NA
AB12345-005 Y 2022-12-01 Y 2022-12-01 NA NA

Apply Cut

The last step is to create the RMD report, to summarize which patients and observations will be cut, and then apply the cut to strip out all observations flagged as to be removed.

cut_data <- purrr::map(
  c(patient_cut_data, date_cut_data, list(dm = dm_cut)),
  apply_cut,
  dcutvar = DCUT_TEMP_REMOVE,
  dthchangevar = DCUT_TEMP_DTHCHANGE
)

Output Final List of Cut Datasets

Lastly, we create the final list of all the cut SDTM data, adding in the SDTM where no cut was needed.

final_data <- c(cut_data, no_cut_list, list(dcut = dcut))