Introduction to sendigR

2022-08-16

The purpose of the sendigR package is to extract data from a set of nonclinical studies stored in CDISC SEND format in a database to be used for cross study analysis. It can either be done in a script by execution of a set of functions from the package to extract data for further processing or by execution an encapsulated R Shiny application.

The package supports currently these types of databases:

This vignettes introduces how to use sendigR to:

Familiarity with the CDISC SEND data model and at least basic knowledge about animal studies is necessary to follow this vignette and to use the package.

Getting started

Before we are ready to use the functions in the package, we must ensure that a minimum of prerequisites are fulfilled.

Package dependencies

Beside the required packages listed in the Imports section of DESCRIPTION, one or more additional package(s) must be installed:

Database

If an existing SEND database is to be accessed, it must contain a set of tables and columns representing the domains and variables described in the SEND IG version 3.0 or 3.1 - or a union of the two versions.

It is required that all names and basic data types (i.e. numerical or character) are as described the the SEND IG.

Study data to import

If study data are to be imported into a SQLite database, data for each study must be located in a separate folder as SAS transport files - also konw as XPT files.

Multiple studies may be located together in a folder hierarchy as described below.

SEND controlled terminology

Several of the function in the packages uses CDISC SEND terminology code list to verify validity of the SEND data. The package includes the values for the relevant code lists extracted from the newest CT version at build time. If it is relevant to use another version - newer or older - this version must be downloaded in Excel format from NCI or CDISC library archives and saved in any folder which is accessible from the R script(s) using this package.

Function overview.

The package contains 5 sets of functions covering different areas - each of these are described below, including detailed examples.

Initiation and closure of SEND environment

Before any use of the package’s function, a database must be opened and the database must be closed again as the final action.

Function Description
initEnvironment Initialize the environment. It must be executed as the absolute first function before any other package functions. It opens an existing or create a new empty database and returns a handle to the database, which must be used in all following function calls.
disconnectDB Close the open database. It must be called as the absolute last function in a session.

The call of initEnvironment returns a handle with a reference to the open database and pointers to low level functions accessing the specific type of database. This handle must be given as the first input parameter in the execution of all other function in the package. It is possible to open and access multiple databases - also of different types - in the same R session. Just execute initEnvironment for each relevant database and save the returned db handles in each distinct variable.

Examples in the following sections demonstrates the use of these functions in a complete work flow.

Build a SQLite SEND database

It is possible to build an SQLite database with SEND data imported from a set of studies.

Function Description
dbCreateSchema Create a SEND schema, i.e. all the tables to represent the domains documented in SEND IG, in an open and empty database
dbCreateIndexes Create a set of indexes on the tables in a SEND database to optimize performance of extraction of data
dbImportOneStudy Import SEND study data in SAS xport format into a SEND database from a single study folder
dbImportStudies Import SEND study data in SAS xport format into a SEND database from a hierarchy study folders
dbDeleteStudies Delete one or more studies in SEND database

It should be possible to import data into an existing SQLite SEND database using dbImportOneStudy and dbImportStudies, even though the tables are not created by dbCreateSchema. It is possible to delete studies in any SQLite SEND database dbDeleteStudies - whether or not the tables are created by dbCreateSchema.

To make it possible to import data for a set of studies with dbImportStudies, the input data should be saved in folder structure like this:

/path/to/SEND/datasets  
+-- study01  
   |   +-- ts.xpt  
   |   +-- dm.xpt  
   |   +-- ex.xpt  
   |   +-- etc.  
+-- study02   
   |   +-- ts.xpt  
   |   +-- dm.xpt  
   |   +-- ex.xpt  
   |   +-- etc.  
+-- proj1234
    +-- study11
        |   +-- ts.xpt  
        |   +-- dm.xpt  
        |   +-- ex.xpt  
        |   +-- etc.
    +-- study12
        |   +-- ts.xpt  
        |   +-- dm.xpt  
        |   +-- ex.xpt  
        |   +-- etc.
+-- etc.

It is recommended to download DB Browser for SQLite to look at the data imported into the database.

Examples

Create a new database, create all the SEND domain tables and import data for a set of studies XPT files saved in a folder hierarchy - create a set of indexes to optimize data extraction performance for the sendigR functions:

# The database is created by the call of  initEnvironment with the parameter dbCreate = TRUE:
dbToken <- initEnvironment(dbType='sqlite', 
                           dbPath='/path/to/db/send.db', 
                           dbCreate=TRUE)

# The tables must be created before any study can be imported
dbCreateSchema(dbToken)
                             
status <- dbImportStudies(dbToken, '/path/to/SEND/datasets', 
                          # Print contiously the status for import each study:
                          verbose = TRUE,
                          # sand save the status in a log file:
                          logFilePath = '/path/to/log file')

# Create a set of indexes to increase query performance for the data extraction functions
# - they may be created before of after import of data
dbCreateteIndexes(dbToken)

disconnectDB(dbToken)

Open an existing database and import data for a single study, allow it to replace potential existing data for the the same study in the database. Delete another study from the database:

dbToken <- initEnvironment(dbType='sqlite', dbPath='/path/to/db/send.db')

dbImportOneStudy(dbToken, '/path/to/SEND/datasets/proj1234/study11', overWrite=TRUE)

dbDeleteStudies(dbToken, 'study99')

disconnectDB(dbToken)

Extract and filter SEND data

These functions are divided into different groups:

The functions rely on the data.table package - and all input/output sets of data are data tables.

The complete list of functions:

Function Description
genericQuery Execute database query and returns fetched rows
getStudiesSDESIGN Extract a list of SEND studies with a specified study design
getStudiesSTSTDTC Extract a list of SEND studies with study start date within a specified interval
getControlSubj Extract a list of control animals for a list of studies
getSubjRoute Extract the set of animals of the specified route of administration - or just add actual route of administration for each animal.
getSubjSpeciesStrain Extract the set of animals of the specified species and strain - or just add the species and strain for each animal.
getSubjSex Extract the set of animals of the specified sex - or just add the sex of each animal.
getSubjData Extract data from a subject level domain.
getFindingsPhase Extract a set of findings for a specified study phase - or just add phase for each animal
getFindingsSubjAge Add the subject age at finding time - and optionally extract the set of findings within a specified range of ages
gen_vocab Create json file for vocabulary mappings
standardize_file Standardizes SEND xpt files using CDISC controlled terminologies

All the functions have an input parameter called inclUncertain, default value is FALSE.
If a function which is doing any kind of filtering is executed with:

As mentioned above, some of the functions may be executed with no filter conditions just to add the relevant attributes to a set of data, e.g. a SEX variable is added to the output data set by execution of getSubjSex. In these cases, the additional variable is populated with the found value for all rows, also if the value cannot be confidently identified - e.g. when a DM.SEX value like ‘Male’ does not match any value in the CDISC SEND CT code list value SEX.
These functions have an input parameter called noFilterReportUncertain, which is equivalent to the inclUncertain parameter described above.
If one of these functions is executed with no filter condition and noFilterReportUncertain = FALSE (default), no explanation of whether the identified values are confident or not is included.
If a function is executed with no filter condition and noFilterReportUncertain = TRUE, a NOT_VALID_MSG variable is included in the output data set with an explanation of the reason for not confident values which may for found.
The format of this variable is equivalent to the UNCERTAIN_MSG variable described above.

The intention with these functions are to build a work flow or pipeline to extract set(s) of animals, findings or other kinds of subject level data to be used for further analysis or similar - see an example below.

Example

Extract data to make this analysis:

What are the most common microscopic findings in control Sprague-Dawley male rats 10 to 14 weeks old at finding time that were dosed by oral or oral gavage. Include parallel studies after 2016 and findings from the treatment phase. Include uncertain records.

db <- initEnvironment(dbType='sqlite',
                      dbPath='/path/to/db/send.db')

# Extract the relevant set of studies into data.table 'studies'
studies <-
  getStudiesSDESIGN(db,
                    studyDesignFilter = 'PARALLEL',
                    inclUncertain = TRUE) %>%
  getStudiesSTSTDTC(db, ., 
                    fromDTC = '2016',
                    inclUncertain = TRUE)

# Extract the complete set of control animals for the set of extracted studies 
# into data.table 'controlAnimals':
controlAnimals <-
  getControlSubj(db, studies,
                 inclUncertain = TRUE)

# Extract all 
#  1: the males
#  2: Sprague-Dawley rats
#  3: animals dosed by oral or oral gavage
# from the set of control animals into data.table 'animals':
animals <- 
  getSubjSex(db, controlAnimals, 
              sexFilter = 'M',
              inclUncertain = TRUE) %>%
  getSubjSpeciesStrain(db, .,
                       speciesFilter = 'RAT',
                       strainFilter = 'Sprague-Dawley',
                       inclUncertain = TRUE) %>%
  getSubjRoute(db, .,
               routeFilter = c('ORAL', 'ORAL GAVAGE'),
               inclUncertain = TRUE)

# Extract microscopic findings
#  1: all MI rows for the extrated set of animals
#  2: keep the MI rows from the treatment phase
#  3: keep the MI rows where the animals are between 10 and 14 weeks at the time 
#     of finding
# into data.table 'dataMI':
dataMI <- 
  getSubjData(db, animals, 'MI') %>%
  getFindingsPhase(db, .,
                   phaseFilter = "treatment",
                   inclUncertain = TRUE) %>%
  getFindingsSubjAge(db, .,
                     fromAge = '10w',
                     toAge = '14w',
                     inclUncertain = TRUE)
                     
disconnectDB(db)

The resulting data.table dataMI contains all the extracted findings fulfilling the wanted conditions and can be used for further analysis.

It includes both findings

  • for animals which may have been identified as uncertain by one of the study or subject level functions - this level of uncertainties are described in the animal.UNCERTAIN_MSG column
  • and findings which may have been identified as uncertain by the findings level functions- this level of uncertainties are described in the dataMI.UNCERTAIN_MSG column.

Execute sendDashboard app

The sendigR package contains an encapsulated Shiny app - SendDashboard. The app enables the user to search and extract historical control data in a SEND database.

Function Description
execSendDashboard Starts the SendDashboard app

Example

Connect to the database, start the app. Remember to disconnect from database when leaving the app.

library(sendigR)
dbToken <- initEnvironment(dbType='sqlite', dbPath='/path/to/db/send.db')

execSendDashboard(dbToken)

disconnectDB(dbToken)