It is often the case that in an infectious disease outbreak epidemiological parameters are required in order to characterise and model the dynamics of disease transmission and evaluate control strategies. In those scenarios, epidemiological parameters are commonly retrieved from the literature, and there is currently no library of parameters in order to contrast and compare different reported parameters for a range of infectious diseases and pathogens, from different published studies over time, of which some may be meta-analyses.
The {epiparameter} R package is a library of epidemiological parameters, with classes to handle this data and a set of functions to manipulate and use epidemiological parameters and distributions. The package also contains functionality for converting and extracting distribution parameters from summary statistics.
An outbreak of a known or potentially novel pathogen is detected and key parameters such as delay distributions (e.g. incubation period or serial interval) are required to interpret early data.
{epiparameter} can provide these distributions from a selection of published sources, such as past analysis of the same or similar pathogen, in order to provide relevant epidemiological parameters for new analysis.
This vignette will provide a introduction to the data stored within {epiparameter}, how to read it into R, manipulate the data, and the functions (and methods) implemented in the package to facilitate easy application of parameters into epidemiological pipelines.
First, we will introduce the library, or database, of epidemiological parameters available from {epiparameter}. The library is stored internally and can be read into R using the epiparameter_db()
function. By default all entries in the library are returned.
db <- epiparameter_db()
#> Returning 125 results that match the criteria (100 are parameterised).
#> Use subset to filter by entry variables or single_epiparameter to return a single entry.
#> To retrieve the citation for each use the 'get_citation' function
db
#> # List of 125 <epiparameter> objects
#> Number of diseases: 23
#> ❯ Adenovirus ❯ COVID-19 ❯ Chikungunya ❯ Dengue ❯ Ebola Virus Disease ❯ Hantavirus Pulmonary Syndrome ❯ Human Coronavirus ❯ Influenza ❯ Japanese Encephalitis ❯ MERS ❯ Marburg Virus Disease ❯ Measles ❯ Mpox ❯ Parainfluenza ❯ Pneumonic Plague ❯ RSV ❯ Rhinovirus ❯ Rift Valley Fever ❯ SARS ❯ Smallpox ❯ West Nile Fever ❯ Yellow Fever ❯ Zika Virus Disease
#> Number of epi parameters: 13
#> ❯ case fatality risk ❯ generation time ❯ hospitalisation to death ❯ hospitalisation to discharge ❯ incubation period ❯ notification to death ❯ notification to discharge ❯ offspring distribution ❯ onset to death ❯ onset to discharge ❯ onset to hospitalisation ❯ onset to ventilation ❯ serial interval
#> [[1]]
#> Disease: Adenovirus
#> Pathogen: Adenovirus
#> Epi Parameter: incubation period
#> Study: Lessler J, Reich N, Brookmeyer R, Perl T, Nelson K, Cummings D (2009).
#> "Incubation periods of acute respiratory viral infections: a systematic
#> review." _The Lancet Infectious Diseases_.
#> doi:10.1016/S1473-3099(09)70069-6
#> <https://doi.org/10.1016/S1473-3099%2809%2970069-6>.
#> Distribution: lnorm (days)
#> Parameters:
#> meanlog: 1.723
#> sdlog: 0.231
#>
#> [[2]]
#> Disease: Human Coronavirus
#> Pathogen: Human_Cov
#> Epi Parameter: incubation period
#> Study: Lessler J, Reich N, Brookmeyer R, Perl T, Nelson K, Cummings D (2009).
#> "Incubation periods of acute respiratory viral infections: a systematic
#> review." _The Lancet Infectious Diseases_.
#> doi:10.1016/S1473-3099(09)70069-7
#> <https://doi.org/10.1016/S1473-3099%2809%2970069-7>.
#> Distribution: lnorm (days)
#> Parameters:
#> meanlog: 1.163
#> sdlog: 0.140
#>
#> [[3]]
#> Disease: SARS
#> Pathogen: SARS-Cov-1
#> Epi Parameter: incubation period
#> Study: Lessler J, Reich N, Brookmeyer R, Perl T, Nelson K, Cummings D (2009).
#> "Incubation periods of acute respiratory viral infections: a systematic
#> review." _The Lancet Infectious Diseases_.
#> doi:10.1016/S1473-3099(09)70069-8
#> <https://doi.org/10.1016/S1473-3099%2809%2970069-8>.
#> Distribution: lnorm (days)
#> Parameters:
#> meanlog: 1.386
#> sdlog: 0.593
#>
#> # ℹ 122 more elements
#> # ℹ Use `print(n = ...)` to see more elements.
#> # ℹ Use `parameter_tbl()` to see a summary table of the parameters.
#> # ℹ Explore database online at: https://epiverse-trace.github.io/epiparameter/articles/database.html
The output is a list of <epiparameter>
objects, where each element in the list corresponds to an entry in the parameter database.
To see a full list of the diseases and distributions stored in the library use the
parameter_tbl()
function. Here we show the first six rows of the output.
parameter_tbl(multi_epiparameter = db)
#> # Parameter table:
#> # A data frame: 125 × 7
#> disease pathogen epi_name prob_distribution author year sample_size
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 Adenovirus Adenovi… incubat… lnorm Lessl… 2009 14
#> 2 Human Coronavir… Human_C… incubat… lnorm Lessl… 2009 13
#> 3 SARS SARS-Co… incubat… lnorm Lessl… 2009 157
#> 4 Influenza Influen… incubat… lnorm Lessl… 2009 151
#> 5 Influenza Influen… incubat… lnorm Lessl… 2009 90
#> 6 Influenza Influen… incubat… lnorm Lessl… 2009 78
#> 7 Measles Measles… incubat… lnorm Lessl… 2009 55
#> 8 Parainfluenza Parainf… incubat… lnorm Lessl… 2009 11
#> 9 RSV RSV incubat… lnorm Lessl… 2009 24
#> 10 Rhinovirus Rhinovi… incubat… lnorm Lessl… 2009 28
#> # ℹ 115 more rows
parameter_tbl()
can also subset the database supplied to the function.
parameter_tbl(multi_epiparameter = db, disease = "Ebola")
#> # Parameter table:
#> # A data frame: 17 × 7
#> disease pathogen epi_name prob_distribution author year sample_size
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 Ebola Virus Dis… Ebola V… offspri… nbinom Lloyd… 2005 13
#> 2 Ebola Virus Dis… Ebola V… incubat… lnorm Eichn… 2011 196
#> 3 Ebola Virus Dis… Ebola V… onset t… gamma The E… 2018 14
#> 4 Ebola Virus Dis… Ebola V… incubat… gamma WHO E… 2015 1798
#> 5 Ebola Virus Dis… Ebola V… incubat… gamma WHO E… 2015 49
#> 6 Ebola Virus Dis… Ebola V… incubat… gamma WHO E… 2015 957
#> 7 Ebola Virus Dis… Ebola V… incubat… gamma WHO E… 2015 792
#> 8 Ebola Virus Dis… Ebola V… serial … gamma WHO E… 2015 305
#> 9 Ebola Virus Dis… Ebola V… serial … gamma WHO E… 2015 37
#> 10 Ebola Virus Dis… Ebola V… serial … gamma WHO E… 2015 147
#> 11 Ebola Virus Dis… Ebola V… serial … gamma WHO E… 2015 112
#> 12 Ebola Virus Dis… Ebola V… hospita… gamma WHO E… 2015 1167
#> 13 Ebola Virus Dis… Ebola V… hospita… gamma WHO E… 2015 1004
#> 14 Ebola Virus Dis… Ebola V… notific… gamma WHO E… 2015 2536
#> 15 Ebola Virus Dis… Ebola V… notific… gamma WHO E… 2015 1324
#> 16 Ebola Virus Dis… Ebola V… onset t… gamma WHO E… 2015 2741
#> 17 Ebola Virus Dis… Ebola V… onset t… gamma WHO E… 2015 1335
More details on the data collation and the library of parameters can be found in the Data Collation and Synthesis Protocol vignette.
{epiparameter} introduces a new class for working with epidemiological parameters in R: <epiparameter>
, contains the name of the disease, the name of the epidemiological distribution, parameters (if available) and citation information of parameter source, as well as other information. This is the core data structure in the {epiparameter} package and holds a single set of epidemiological parameters.
An <epiparameter>
object can be:
epiparameter_db()
)# <epiparameter> from database
# fetch <epiparameter> for COVID-19 incubation period from database
# return only a single <epiparameter>
covid_incubation <- epiparameter_db(
disease = "COVID-19",
epi_name = "incubation period",
single_epiparameter = TRUE
)
#> Using Linton N, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov A, Jung S, Yuan
#> B, Kinoshita R, Nishiura H (2020). "Incubation Period and Other
#> Epidemiological Characteristics of 2019 Novel Coronavirus Infections
#> with Right Truncation: A Statistical Analysis of Publicly Available
#> Case Data." _Journal of Clinical Medicine_. doi:10.3390/jcm9020538
#> <https://doi.org/10.3390/jcm9020538>..
#> To retrieve the citation use the 'get_citation' function
covid_incubation
#> Disease: COVID-19
#> Pathogen: SARS-CoV-2
#> Epi Parameter: incubation period
#> Study: Linton N, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov A, Jung S, Yuan
#> B, Kinoshita R, Nishiura H (2020). "Incubation Period and Other
#> Epidemiological Characteristics of 2019 Novel Coronavirus Infections
#> with Right Truncation: A Statistical Analysis of Publicly Available
#> Case Data." _Journal of Clinical Medicine_. doi:10.3390/jcm9020538
#> <https://doi.org/10.3390/jcm9020538>.
#> Distribution: lnorm (days)
#> Parameters:
#> meanlog: 1.525
#> sdlog: 0.629
epiparameter()
)# <epiparameter> using constructor function
covid_incubation <- epiparameter(
disease = "COVID-19",
pathogen = "SARS-CoV-2",
epi_name = "incubation period",
prob_distribution = create_prob_distribution(
prob_distribution = "gamma",
prob_distribution_params = c(shape = 2, scale = 1)
),
summary_stats = create_summary_stats(mean = 2),
citation = create_citation(
author = person(
given = list("John", "Amy"),
family = list("Smith", "Jones")
),
year = 2022,
title = "COVID Incubation Period",
journal = "Epi Journal",
doi = "10.27861182.x"
)
)
#> Using Smith J, Jones A (2022). "COVID Incubation Period." _Epi Journal_.
#> doi:10.27861182.x <https://doi.org/10.27861182.x>.
#> To retrieve the citation use the 'get_citation' function
covid_incubation
#> Disease: COVID-19
#> Pathogen: SARS-CoV-2
#> Epi Parameter: incubation period
#> Study: Smith J, Jones A (2022). "COVID Incubation Period." _Epi Journal_.
#> doi:10.27861182.x <https://doi.org/10.27861182.x>.
#> Distribution: gamma (NA)
#> Parameters:
#> shape: 2.000
#> scale: 1.000
Not all arguments are specified in the example using the class constructor (epiparameter()
) above, for example the metadata
or parameter uncertainty (uncertainty
) is not provided. See the help documentation for the epiparameter()
function using ?epiparameter
to see each argument. Also see documentation for <epiparameter>
helper functions, e.g., ?create_citation()
.
Manually creating <epiparameter>
objects can be especially useful if new parameter estimates become available but are not yet incorporated into the {epiparameter} library.
As seen in the examples in this vignette, the <epiparameter>
class has a custom printing method which shows the disease, pathogen (if known), the epidemiological distribution, a citation of the study the parameters are from and the probability distribution and parameter of that distribution (if available).
<epiparameter>
By providing a consistent and robust object to store epidemiological parameters, <epiparameter>
objects can be applied in epidemiological pipelines, for example {episoap}. The data contained within the object (e.g. parameter values, pathogen type, etc.) can be modified but the pipeline will continue to operate because the class is unchanged.
The probability distribution (prob_distribution
) argument requires the distribution specified in the standard R naming. In some cases these are the same as the distribution’s name, e.g., gamma
and weibull
. Examples of where the distribution name and R name differ are lognormal and lnorm
, negative binomial and nbinom
, geometric and geom
, and poisson and pois
.
The database can be subset directly by epiparameter_db()
. Here the results can be subset by author. It is recommended to use the family name of the first author instead of the full name. Only the first author will be matched when the entry is from a source with multiple authors.
epiparameter_db(
disease = "COVID-19",
epi_name = "incubation period",
author = "Linton"
)
#> Returning 3 results that match the criteria (3 are parameterised).
#> Use subset to filter by entry variables or single_epiparameter to return a single entry.
#> To retrieve the citation for each use the 'get_citation' function
#> # List of 3 <epiparameter> objects
#> Number of diseases: 1
#> ❯ COVID-19
#> Number of epi parameters: 1
#> ❯ incubation period
#> [[1]]
#> Disease: COVID-19
#> Pathogen: SARS-CoV-2
#> Epi Parameter: incubation period
#> Study: Linton N, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov A, Jung S, Yuan
#> B, Kinoshita R, Nishiura H (2020). "Incubation Period and Other
#> Epidemiological Characteristics of 2019 Novel Coronavirus Infections
#> with Right Truncation: A Statistical Analysis of Publicly Available
#> Case Data." _Journal of Clinical Medicine_. doi:10.3390/jcm9020538
#> <https://doi.org/10.3390/jcm9020538>.
#> Distribution: lnorm (days)
#> Parameters:
#> meanlog: 1.456
#> sdlog: 0.555
#>
#> [[2]]
#> Disease: COVID-19
#> Pathogen: SARS-CoV-2
#> Epi Parameter: incubation period
#> Study: Linton N, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov A, Jung S, Yuan
#> B, Kinoshita R, Nishiura H (2020). "Incubation Period and Other
#> Epidemiological Characteristics of 2019 Novel Coronavirus Infections
#> with Right Truncation: A Statistical Analysis of Publicly Available
#> Case Data." _Journal of Clinical Medicine_. doi:10.3390/jcm9020538
#> <https://doi.org/10.3390/jcm9020538>.
#> Distribution: lnorm (days)
#> Parameters:
#> meanlog: 1.611
#> sdlog: 0.472
#>
#> [[3]]
#> Disease: COVID-19
#> Pathogen: SARS-CoV-2
#> Epi Parameter: incubation period
#> Study: Linton N, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov A, Jung S, Yuan
#> B, Kinoshita R, Nishiura H (2020). "Incubation Period and Other
#> Epidemiological Characteristics of 2019 Novel Coronavirus Infections
#> with Right Truncation: A Statistical Analysis of Publicly Available
#> Case Data." _Journal of Clinical Medicine_. doi:10.3390/jcm9020538
#> <https://doi.org/10.3390/jcm9020538>.
#> Distribution: lnorm (days)
#> Parameters:
#> meanlog: 1.525
#> sdlog: 0.629
#>
#> # ℹ Use `parameter_tbl()` to see a summary table of the parameters.
#> # ℹ Explore database online at: https://epiverse-trace.github.io/epiparameter/articles/database.html
The results can be further subset using the subset
argument, for example subset = sample_size > 100
will return entries with a sample size greater than 100. See ?epiparameter_db()
for details on how to use this argument to subset which database entries get returned.
If a single <epiparameter>
is required then the single_epiparameter
argument can be set to TRUE
and this will return a single set of epidemiological parameters (i.e. one delay distribution), if available. If multiple entries in the parameter library match the search criteria (e.g. disease type) then the entries that are parameterised (i.e. distribution parameters are known), account for right truncation when inferred, and were estimated from the largest sample size are preferentially selected (in that order).
epiparameter_db(disease = "SARS", single_epiparameter = TRUE)
#> Using Lessler J, Reich N, Brookmeyer R, Perl T, Nelson K, Cummings D (2009).
#> "Incubation periods of acute respiratory viral infections: a systematic
#> review." _The Lancet Infectious Diseases_.
#> doi:10.1016/S1473-3099(09)70069-8
#> <https://doi.org/10.1016/S1473-3099%2809%2970069-8>..
#> To retrieve the citation use the 'get_citation' function
#> Disease: SARS
#> Pathogen: SARS-Cov-1
#> Epi Parameter: incubation period
#> Study: Lessler J, Reich N, Brookmeyer R, Perl T, Nelson K, Cummings D (2009).
#> "Incubation periods of acute respiratory viral infections: a systematic
#> review." _The Lancet Infectious Diseases_.
#> doi:10.1016/S1473-3099(09)70069-8
#> <https://doi.org/10.1016/S1473-3099%2809%2970069-8>.
#> Distribution: lnorm (days)
#> Parameters:
#> meanlog: 1.386
#> sdlog: 0.593
<epiparameter>
objects store distributions, and mathematical functions of these distribution can easily be extracted directly from them. It is often useful to access the probability density function, cumulative distribution function, quantiles of the distribution, or generate random numbers from the distribution in the <epiparameter>
object. The distribution functions in {epiparameter} allow users to easily use these.
ebola_incubation <- epiparameter_db(
disease = "Ebola",
epi_name = "incubation period",
single_epiparameter = TRUE
)
#> Using WHO Ebola Response Team, Agua-Agum J, Ariyarajah A, Aylward B, Blake I,
#> Brennan R, Cori A, Donnelly C, Dorigatti I, Dye C, Eckmanns T, Ferguson
#> N, Formenty P, Fraser C, Garcia E, Garske T, Hinsley W, Holmes D,
#> Hugonnet S, Iyengar S, Jombart T, Krishnan R, Meijers S, Mills H,
#> Mohamed Y, Nedjati-Gilani G, Newton E, Nouvellet P, Pelletier L,
#> Perkins D, Riley S, Sagrado M, Schnitzler J, Schumacher D, Shah A, Van
#> Kerkhove M, Varsaneux O, Kannangarage N (2015). "West African Ebola
#> Epidemic after One Year — Slowing but Not Yet under Control." _The New
#> England Journal of Medicine_. doi:10.1056/NEJMc1414992
#> <https://doi.org/10.1056/NEJMc1414992>..
#> To retrieve the citation use the 'get_citation' function
density(ebola_incubation, at = 0.5)
#> [1] 0.03608013
cdf(ebola_incubation, q = 0.5)
#> [1] 0.01178094
quantile(ebola_incubation, p = 0.5)
#> [1] 8.224347
generate(ebola_incubation, times = 10)
#> [1] 9.184342 10.562088 4.997837 1.695285 3.553407 6.656735 6.821006
#> [8] 7.451888 15.597509 17.019611
<epiparameter>
objects can easily be plotted to see the PDF and CDF of distribution.
The default plotting range for time since infection is from zero to the 99th quantile of the distribution. This can be altered by specifying the xlim
argument when plotting an <epiparameter>
object.
This plotting function can be useful for visually comparing epidemiological distributions from different publications on the same disease. In addition, plotting the distribution after manually creating an <epiparameter>
help to check that the parameters are sensible and produce the expected distribution.
The <epiparameter>
class also has accessor functions that can help access elements from the object in a standardised format.
get_parameters(ebola_incubation)
#> shape scale
#> 1.577781 6.528155
get_citation(ebola_incubation)
#> WHO Ebola Response Team, Agua-Agum J, Ariyarajah A, Aylward B, Blake I,
#> Brennan R, Cori A, Donnelly C, Dorigatti I, Dye C, Eckmanns T, Ferguson
#> N, Formenty P, Fraser C, Garcia E, Garske T, Hinsley W, Holmes D,
#> Hugonnet S, Iyengar S, Jombart T, Krishnan R, Meijers S, Mills H,
#> Mohamed Y, Nedjati-Gilani G, Newton E, Nouvellet P, Pelletier L,
#> Perkins D, Riley S, Sagrado M, Schnitzler J, Schumacher D, Shah A, Van
#> Kerkhove M, Varsaneux O, Kannangarage N (2015). "West African Ebola
#> Epidemic after One Year — Slowing but Not Yet under Control." _The New
#> England Journal of Medicine_. doi:10.1056/NEJMc1414992
#> <https://doi.org/10.1056/NEJMc1414992>.
Parameters are often reported in the literature as mean and standard deviation (or variance). These summary statistics can often be (analytically) converted to the parameters of the distribution using the conversion function in the package (convert_summary_stats_to_params()
). We also provide conversion functions in the opposite direction, parameters to summary statistics (convert_params_to_summary_stats()
).
The functions extract_param()
handles all the extraction of parameter estimates from summary statistics. The two extractions currently supported in {epiparameter} are from percentiles and from median and range.
If a set of epidemiological parameter has been inferred and is known to the user but has not yet been incorporated into the {epiparameter} database, these parameters can be manually added to the library.
# wrap <epiparameter> in list to append to database
new_db <- append(db, covid_incubation)
tail(new_db, n = 3)
#> [[1]]
#> Disease: Chikungunya
#> Pathogen: Chikungunya Virus
#> Epi Parameter: generation time
#> Study: Guzzetta G, Vairo F, Mammone A, Lanini S, Poletti P, Manica M, Rosa R,
#> Caputo B, Solimini A, della Torre A, Scognamiglio P, Zumla A, Ippolito
#> G, Merler S (2020). "Spatial modes for transmission of chikungunya
#> virus during a large chikungunya outbreak in Italy: a modeling
#> analysis." _BMC Medicine_. doi:10.1186/s12916-020-01674-y
#> <https://doi.org/10.1186/s12916-020-01674-y>.
#> Distribution: gamma (days)
#> Parameters:
#> shape: 8.633
#> scale: 1.447
#>
#> [[2]]
#> Disease: Chikungunya
#> Pathogen: Chikungunya Virus
#> Epi Parameter: case fatality risk
#> Study: de Souza W, de Lima S, Mello L, Candido D, Buss L, Whittaker C, Claro
#> I, Chandradeva N, Granja F, de Jesus R, Lemos P, Toledo-Teixeira D,
#> Barbosa P, Firmino A, Amorim M, Duarte L, Pessoa Jr I, Forato J,
#> Vasconcelos I, Maximo A, Araújo E, Mello L, Sabino E, Proença-Módena J,
#> Faria N, Weaver S (2023). "Spatiotemporal dynamics and recurrence of
#> chikungunya virus in Brazil: an epidemiological study." _The Lancet
#> Microbe_. doi:10.1016/S2666-5247(23)00033-2
#> <https://doi.org/10.1016/S2666-5247%2823%2900033-2>.
#> Parameters: <no parameters>
#> Mean: 1.3 (deaths per 1000 cases)
#>
#> [[3]]
#> Disease: COVID-19
#> Pathogen: SARS-CoV-2
#> Epi Parameter: incubation period
#> Study: Smith J, Jones A (2022). "COVID Incubation Period." _Epi Journal_.
#> doi:10.27861182.x <https://doi.org/10.27861182.x>.
#> Distribution: gamma (NA)
#> Parameters:
#> shape: 2.000
#> scale: 1.000
Note that this only adds the parameters to the library in the environment, and does not save to the database file in the package. Hence, if you restart your R session, you will lose the changes.
The library of epidemiological parameters is a living database, so as new studies are published we hope to incorporate these. Searching for and recording parameters in the database is extremely time-consuming, so we welcome contributions of new parameters by either making a pull request to the package or adding information to the contributing spreadsheet. These will be incorporated into the database by the package maintainers and your contributions will be acknowledged. See the Data Collation and Synthesis Protocol vignette on information about contributing to the library of epidemiological parameters.