Many studies include data from assays which have not been integrated into the DataSpace. Some of these are available as “Non-Integrated Datasets,” which can be downloaded from the app as a zip file.
DataSpaceR provides an interface for accessing non-integrated data from studies where it is available.
Methods on the DataSpace Study object allow you to see what non-integrated data may be available before downloading it. We will be using HVTN 505 as an example.
library(DataSpaceR) con <- connectDS() vtn505 <- con$getStudy("vtn505") vtn505 #> <DataSpaceStudy> #> Study: vtn505 #> URL: https://dataspace.cavd.org/CAVD/vtn505 #> Available datasets: #> - Binding Ab multiplex assay #> - Demographics #> - Intracellular Cytokine Staining #> - Neutralizing antibody #> Available non-integrated datasets: #> - ADCP #> - Demographics (Supplemental) #> - Fc Array
The print method on the study object will list available non-integrated datasets. The
availableDatasets property shows some more info about available datasets, with the
integrated field indicating whether the data is integrated. The value for
n will be
NA for non-integrated data until the dataset has been loaded.
|BAMA||Binding Ab multiplex assay||10260||TRUE|
|ICS||Intracellular Cytokine Staining||22684||TRUE|
|DEM SUPP||Demographics (Supplemental)||NA||FALSE|
|Fc Array||Fc Array||NA||FALSE|
Non-Integrated datasets can be loaded with
getDataset like integrated data. This will unzip the non-integrated data to a temp directory and load it into the environment.
You can also view the file format info using
getDatasetDescription. For non-integrated data, this will open a pdf into your computer’s default pdf viewer.
Non-integrated data is downloaded to a temp directory by default. There are a couple of ways to override this if desired. One is to specify
outputDir when calling
If you will be accessing the data at another time and don’t want to have to re-download it, you can change the default directory for the whole study object with
If the dataset already exists in the specified
outputDir, it will be not be downloaded. This can be overridden with
reload=TRUE, which forces a re-download.
sessionInfo() #> R version 4.1.2 (2021-11-01) #> Platform: x86_64-pc-linux-gnu (64-bit) #> Running under: Ubuntu 18.04.5 LTS #> #> Matrix products: default #> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 #> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1 #> #> locale: #>  LC_CTYPE=en_US.utf8 LC_NUMERIC=C #>  LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 #>  LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 #>  LC_PAPER=en_US.utf8 LC_NAME=C #>  LC_ADDRESS=C LC_TELEPHONE=C #>  LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C #> #> attached base packages: #>  stats graphics grDevices utils datasets methods base #> #> other attached packages: #>  data.table_1.14.2 DataSpaceR_0.7.5 knitr_1.37 #> #> loaded via a namespace (and not attached): #>  Rcpp_1.0.8 digest_0.6.29 assertthat_0.2.1 R6_2.5.1 #>  jsonlite_1.8.0 magrittr_2.0.2 evaluate_0.15 highr_0.9 #>  httr_1.4.2 stringi_1.7.6 curl_4.3.2 tools_4.1.2 #>  stringr_1.4.0 Rlabkey_2.8.3 xfun_0.29 compiler_4.1.2