exceldata

Lifecycle: Stable CRAN status metacran downloads R-CMD-check

The goal of exceldata is to facilitate the use of Excel as a data entry tool for reproducible research. This package provides tools to automate data cleaning and recoding of data by requiring a data dictionary to accompany the Excel data. A macro-enabled template file has been created to facilitate clean data entry including validation rules for live data checking.

To download the Excel Template Click Here and then click the download button

To view data in the required format without macros Click Here and download

Installation

You can install the released version of exceldata from CRAN with:

install.packages("exceldata")

And the development version from GitHub with:

devtools::install_github("biostatsPMH/exceldata")

Documentation

Online Documentation

PDF Documentation: Click Here and then click on Download

Using the Excel Template

Example

Example of importing the data and producing univariate plots to screen for outliers.

library(exceldata)

exampleDataFile <- system.file("extdata", "exampleData.xlsx", package = "exceldata")
import <- importExcelData(exampleDataFile,dictionarySheet = 'DataDictionary',dataSheet = 'DataEntry')
#> No errors in data.
#> File import complete. Details of variables created are in the logfile:  exampleData17Nov23.log

# The imported data dictionary 
dictionary <- import$dictionary
head(dictionary)
#> # A tibble: 6 × 6
#>   VariableName Description                Type      Minimum    Maximum Levels   
#>   <chr>        <chr>                      <chr>     <chr>      <chr>   <chr>    
#> 1 ID           unique patient identifier  character <NA>       <NA>    <NA>     
#> 2 Age          Patient's age at diagnosis numeric   40         110     <NA>     
#> 3 Gender       Patient's gender           category  <NA>       <NA>    m=Male,f…
#> 4 T_Stage      Tumour Staging             category  <NA>       <NA>    T0,T1,T2…
#> 5 DxDate       Date of Diagnosis          date      2019-01-01 today   <NA>     
#> 6 ECOG         Performance Status         integer   0          5       <NA>

# The imported data, with calculated variables
data <- import$data
head(data)
#> # A tibble: 6 × 9
#>   ID      Age Gender T_Stage DxDate      ECOG Date_Death Date_LFU   T0_Stg
#>   <chr> <dbl> <fct>  <fct>   <date>     <int> <date>     <date>     <fct> 
#> 1 1        77 Female T2      2019-06-05     4 2021-08-06 NA         T1up  
#> 2 2        58 Female T2      2019-09-26     2 2020-06-06 NA         T1up  
#> 3 3        66 Female T4      2019-07-19     0 NA         2020-07-20 T1up  
#> 4 4        72 Female T4      2019-12-17     4 NA         2021-07-04 T1up  
#> 5 5        52 Female T2      2019-06-07     1 2020-12-04 NA         T1up  
#> 6 6        72 Female T1      2021-02-10     2 2021-10-10 NA         T1up

# Simple univariate plots with outliers 
plots <- plotVariables(data,dictionary,IDvar = 'ID',showOutliers = T)
# Not Run: Show plots in viewer
# plots