Incidence rates describe the rate at which new events occur in a population, with the denominator the person-time at risk of the event during this period. In the previous vignettes we have seen how we can identify a set of denominator and outcome cohorts. Incidence rates can then be calculated using time contributed from these denominator cohorts up to their entry into an outcome cohort.
There are a number of options to consider when calculating incidence rates. This package accommodates two main parameters, including:
In this example there is no outcome washout specified and repetitive
events are not allowed, so individuals contribute time up to their first
event during the study period.
In this example the outcome washout is all history and repetitive events are not allowed. As before individuals contribute time up to their first event during the study period, but having an outcome prior to the study period (such as person “3”) means that no time at risk is contributed.
In this example there is some amount of outcome washout and repetitive events are not allowed. As before individuals contribute time up to their first event during the study period, but having an outcome prior to the study period (such as person “3”) means that time at risk is only contributed once sufficient time has passed for the outcome washout criteria to have been satisfied.
Now repetitive events are allowed with some amount of outcome washout specified. So individuals contribute time up to their first event during the study period, and then after passing the outcome washout requirement they begin to contribute time at risk again.
General information on how to define outcome cohorts can be found in the vignette “Creating outcome cohorts”. The most important recommendations for defining an outcome cohort for calculating incidence are:
generateDenominatorCohortSet()
function.Considering all the above, we only recommend restricting outcome definitions to first events if the user is not interested in further occurrences and if all prior history is considered to exclude participants who have already experienced the event.
estimateIncidence()
is the function we use to estimate
incidence rates. To demonstrate its use, let´s load the
IncidencePrevalence package (along with a couple of packages to help for
subsequent plots) and generate 50,000 example patients using the
mockIncidencePrevalenceRef()
function, from whom we´ll
create a denominator population without adding any restrictions other
than a study period. In this example we’ll use permanent tables (rather
than temporary tables which would be used by default).
library(IncidencePrevalence)
library(dplyr)
library(tidyr)
<- mockIncidencePrevalenceRef(
cdm sampleSize = 50000,
outPre = 0.5
)
<- generateDenominatorCohortSet(
cdm cdm = cdm, name = "denominator",
cohortDateRange = c(as.Date("2008-01-01"), as.Date("2012-01-01")),
ageGroup = list(c(0, 150)),
sex = "Both",
daysPriorHistory = 0,
temporary = FALSE,
)#> Creating denominator cohorts
#> Time taken to get cohorts: 0 min and 2 sec
$denominator %>%
cdmglimpse()
#> Rows: ??
#> Columns: 4
#> Database: DuckDB 0.8.1 [eburn@Windows 10 x64:R 4.2.1/:memory:]
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ subject_id <chr> "2", "3", "4", "6", "7", "8", "12", "13", "19", "…
#> $ cohort_start_date <date> 2008-01-01, 2009-12-20, 2011-04-26, 2011-10-13, …
#> $ cohort_end_date <date> 2008-08-03, 2011-10-16, 2011-07-16, 2012-01-01, …
Let´s first calculate incidence rates on a yearly basis, without allowing repetitive events
<- estimateIncidence(
inc cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = "years",
outcomeWashout = 0,
repeatedEvents = FALSE,
temporary = FALSE
)
%>%
inc glimpse()
#> Rows: 4
#> Columns: 30
#> $ analysis_id <chr> "1", "1", "1", "1"
#> $ n_persons <int> 7926, 7055, 6862, 6872
#> $ person_days <dbl> 1465281, 1285865, 1278246, 128…
#> $ n_events <int> 1672, 1708, 1682, 1708
#> $ incidence_start_date <date> 2008-01-01, 2009-01-01, 2010-0…
#> $ incidence_end_date <date> 2008-12-31, 2009-12-31, 2010-1…
#> $ person_years <dbl> 4011.721, 3520.507, 3499.647, …
#> $ incidence_100000_pys <dbl> 41677.88, 48515.75, 48061.99,…
#> $ incidence_100000_pys_95CI_lower <dbl> 39703.87, 46241.92, 45792.30,…
#> $ incidence_100000_pys_95CI_upper <dbl> 43724.63, 50872.45, 50415.07, …
#> $ cohort_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ result_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ outcome_cohort_id <chr> "1", "1", "1", "1"
#> $ outcome_cohort_name <chr> "cohort_1", "cohort_1", "cohor…
#> $ analysis_outcome_washout <dbl> 0, 0, 0, 0
#> $ analysis_repeated_events <lgl> FALSE, FALSE, FALSE, FALSE
#> $ analysis_interval <chr> "years", "years", "years", "ye…
#> $ analysis_complete_database_intervals <lgl> TRUE, TRUE, TRUE, TRUE
#> $ denominator_cohort_id <int> 1, 1, 1, 1
#> $ analysis_min_cell_count <dbl> 5, 5, 5, 5
#> $ denominator_cohort_name <chr> "Denominator cohort 1", "Denom…
#> $ denominator_age_group <chr> "0 to 150", "0 to 150", "0 to …
#> $ denominator_sex <chr> "Both", "Both", "Both", "Both"
#> $ denominator_days_prior_history <dbl> 0, 0, 0, 0
#> $ denominator_start_date <date> 2008-01-01, 2008-01-01, 2008-0…
#> $ denominator_end_date <date> 2012-01-01, 2012-01-01, 2012-0…
#> $ denominator_strata_cohort_definition_id <lgl> NA, NA, NA, NA
#> $ denominator_strata_cohort_name <lgl> NA, NA, NA, NA
#> $ denominator_closed_cohort <lgl> FALSE, FALSE, FALSE, FALSE
#> $ cdm_name <chr> "test_database", "test_databas…
plotIncidence(inc)
Now with a washout of all prior history while still not allowing
repetitive events. Here we use Inf
to specify that we will
use a washout of all prior history for an individual.
<- estimateIncidence(
inc cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = "years",
outcomeWashout = Inf,
repeatedEvents = FALSE,
temporary = FALSE
)
%>%
inc glimpse()
#> Rows: 4
#> Columns: 30
#> $ analysis_id <chr> "1", "1", "1", "1"
#> $ n_persons <int> 6832, 6822, 6850, 6872
#> $ person_days <dbl> 1261517, 1252799, 1277510, 128…
#> $ n_events <int> 1672, 1708, 1682, 1708
#> $ incidence_start_date <date> 2008-01-01, 2009-01-01, 2010-0…
#> $ incidence_end_date <date> 2008-12-31, 2009-12-31, 2010-1…
#> $ person_years <dbl> 3453.845, 3429.977, 3497.632, …
#> $ incidence_100000_pys <dbl> 48409.81, 49796.26, 48089.68,…
#> $ incidence_100000_pys_95CI_lower <dbl> 46116.95, 47462.42, 45818.69,…
#> $ incidence_100000_pys_95CI_upper <dbl> 50787.16, 52215.16, 50444.11, …
#> $ cohort_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ result_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ outcome_cohort_id <chr> "1", "1", "1", "1"
#> $ outcome_cohort_name <chr> "cohort_1", "cohort_1", "cohor…
#> $ analysis_repeated_events <lgl> FALSE, FALSE, FALSE, FALSE
#> $ analysis_interval <chr> "years", "years", "years", "ye…
#> $ analysis_complete_database_intervals <lgl> TRUE, TRUE, TRUE, TRUE
#> $ denominator_cohort_id <int> 1, 1, 1, 1
#> $ analysis_outcome_washout <dbl> NA, NA, NA, NA
#> $ analysis_min_cell_count <dbl> 5, 5, 5, 5
#> $ denominator_cohort_name <chr> "Denominator cohort 1", "Denom…
#> $ denominator_age_group <chr> "0 to 150", "0 to 150", "0 to …
#> $ denominator_sex <chr> "Both", "Both", "Both", "Both"
#> $ denominator_days_prior_history <dbl> 0, 0, 0, 0
#> $ denominator_start_date <date> 2008-01-01, 2008-01-01, 2008-0…
#> $ denominator_end_date <date> 2012-01-01, 2012-01-01, 2012-0…
#> $ denominator_strata_cohort_definition_id <lgl> NA, NA, NA, NA
#> $ denominator_strata_cohort_name <lgl> NA, NA, NA, NA
#> $ denominator_closed_cohort <lgl> FALSE, FALSE, FALSE, FALSE
#> $ cdm_name <chr> "test_database", "test_databas…
plotIncidence(inc)
Now we´ll set the washout to 180 days while still not allowing repetitive events
<- estimateIncidence(
inc cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = "years",
outcomeWashout = 180,
repeatedEvents = FALSE,
temporary = FALSE
)
%>%
inc glimpse()
#> Rows: 4
#> Columns: 30
#> $ analysis_id <chr> "1", "1", "1", "1"
#> $ n_persons <int> 7738, 7055, 6862, 6872
#> $ person_days <dbl> 1418998, 1285865, 1278246, 128…
#> $ n_events <int> 1672, 1708, 1682, 1708
#> $ incidence_start_date <date> 2008-01-01, 2009-01-01, 2010-0…
#> $ incidence_end_date <date> 2008-12-31, 2009-12-31, 2010-1…
#> $ person_years <dbl> 3885.005, 3520.507, 3499.647, …
#> $ incidence_100000_pys <dbl> 43037.27, 48515.75, 48061.99,…
#> $ incidence_100000_pys_95CI_lower <dbl> 40998.87, 46241.92, 45792.30,…
#> $ incidence_100000_pys_95CI_upper <dbl> 45150.78, 50872.45, 50415.07, …
#> $ cohort_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ result_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ outcome_cohort_id <chr> "1", "1", "1", "1"
#> $ outcome_cohort_name <chr> "cohort_1", "cohort_1", "cohor…
#> $ analysis_outcome_washout <dbl> 180, 180, 180, 180
#> $ analysis_repeated_events <lgl> FALSE, FALSE, FALSE, FALSE
#> $ analysis_interval <chr> "years", "years", "years", "ye…
#> $ analysis_complete_database_intervals <lgl> TRUE, TRUE, TRUE, TRUE
#> $ denominator_cohort_id <int> 1, 1, 1, 1
#> $ analysis_min_cell_count <dbl> 5, 5, 5, 5
#> $ denominator_cohort_name <chr> "Denominator cohort 1", "Denom…
#> $ denominator_age_group <chr> "0 to 150", "0 to 150", "0 to …
#> $ denominator_sex <chr> "Both", "Both", "Both", "Both"
#> $ denominator_days_prior_history <dbl> 0, 0, 0, 0
#> $ denominator_start_date <date> 2008-01-01, 2008-01-01, 2008-0…
#> $ denominator_end_date <date> 2012-01-01, 2012-01-01, 2012-0…
#> $ denominator_strata_cohort_definition_id <lgl> NA, NA, NA, NA
#> $ denominator_strata_cohort_name <lgl> NA, NA, NA, NA
#> $ denominator_closed_cohort <lgl> FALSE, FALSE, FALSE, FALSE
#> $ cdm_name <chr> "test_database", "test_databas…
plotIncidence(inc)
And finally we´ll set the washout to 180 days and allow repetitive events
<- estimateIncidence(
inc cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = "years",
outcomeWashout = 180,
repeatedEvents = TRUE,
temporary = FALSE
)
%>%
inc glimpse()
#> Rows: 4
#> Columns: 30
#> $ analysis_id <chr> "1", "1", "1", "1"
#> $ n_persons <int> 7738, 7790, 7875, 7850
#> $ person_days <dbl> 1448414, 1454027, 1487577, 149…
#> $ n_events <int> 1672, 1708, 1682, 1708
#> $ incidence_start_date <date> 2008-01-01, 2009-01-01, 2010-0…
#> $ incidence_end_date <date> 2008-12-31, 2009-12-31, 2010-1…
#> $ person_years <dbl> 3965.541, 3980.909, 4072.764, …
#> $ incidence_100000_pys <dbl> 42163.22, 42904.77, 41298.74,…
#> $ incidence_100000_pys_95CI_lower <dbl> 40166.22, 40893.93, 39348.44,…
#> $ incidence_100000_pys_95CI_upper <dbl> 44233.81, 44988.92, 43320.69, …
#> $ cohort_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ result_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ outcome_cohort_id <chr> "1", "1", "1", "1"
#> $ outcome_cohort_name <chr> "cohort_1", "cohort_1", "cohor…
#> $ analysis_outcome_washout <dbl> 180, 180, 180, 180
#> $ analysis_repeated_events <lgl> TRUE, TRUE, TRUE, TRUE
#> $ analysis_interval <chr> "years", "years", "years", "ye…
#> $ analysis_complete_database_intervals <lgl> TRUE, TRUE, TRUE, TRUE
#> $ denominator_cohort_id <int> 1, 1, 1, 1
#> $ analysis_min_cell_count <dbl> 5, 5, 5, 5
#> $ denominator_cohort_name <chr> "Denominator cohort 1", "Denom…
#> $ denominator_age_group <chr> "0 to 150", "0 to 150", "0 to …
#> $ denominator_sex <chr> "Both", "Both", "Both", "Both"
#> $ denominator_days_prior_history <dbl> 0, 0, 0, 0
#> $ denominator_start_date <date> 2008-01-01, 2008-01-01, 2008-0…
#> $ denominator_end_date <date> 2012-01-01, 2012-01-01, 2012-0…
#> $ denominator_strata_cohort_definition_id <lgl> NA, NA, NA, NA
#> $ denominator_strata_cohort_name <lgl> NA, NA, NA, NA
#> $ denominator_closed_cohort <lgl> FALSE, FALSE, FALSE, FALSE
#> $ cdm_name <chr> "test_database", "test_databas…
plotIncidence(inc)
In the examples above, we have used calculated incidence rates by months and years, but it can be also calculated by weeks, months, quarters, or for the entire study time period. In addition, we can decide whether to include time intervals that are not fully captured in the database (e.g., having data up to June for the last study year when computing yearly incidence rates). By default, incidence will only be estimated for those intervals where the database captures all the interval (completeDatabaseIntervals=TRUE).
Given that we can set estimateIncidence()
to exclude
individuals based on other parameters (e.g., outcomeWashout), it is
important to note that the denominator population used to compute
incidence rates might differ from the one calculated with
generateDenominatorCohortSet()
.
The user can also set the minimum number of events to be reported, below which results will be obscured. By default, results with <5 occurrences are blinded, but if minCellCount=0, all results will be reported. 95 % confidence intervals are calculated using the exact method. We can set verbose=TRUE to report progress as code is running. By default, no progress is reported (verbose=FALSE).
<- estimateIncidence(
inc cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = c("weeks"),
completeDatabaseIntervals = FALSE,
outcomeWashout = 180,
repeatedEvents = TRUE,
minCellCount = 0,
temporary = FALSE
)#> Getting incidence for analysis 1 of 1
#> Overall time taken: 0 mins and 3 secs
estimateIncidence()
will generate a table with incidence
rates for each of the time intervals studied and for each combination of
the parameters set. Similar to the output obtained by
generateDenominatorCohortSet()
, the table generated will
also be associated with attributes such as settings and attrition.
<- estimateIncidence(
inc cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = c("Years"),
outcomeWashout = c(0, 180),
repeatedEvents = TRUE,
temporary = FALSE,
returnParticipants = TRUE
)incidenceAttrition(inc)
#> # A tibble: 22 × 25
#> analysis_id number_records number_subjects reason_id reason excluded_records
#> <chr> <dbl> <dbl> <dbl> <glue> <dbl>
#> 1 1 50000 50000 1 Starti… NA
#> 2 1 50000 50000 2 Missin… 0
#> 3 1 50000 50000 3 Missin… 0
#> 4 1 50000 50000 4 Cannot… 0
#> 5 1 18018 18018 5 No obs… 31982
#> 6 1 18018 18018 6 Doesn'… 0
#> 7 1 18018 18018 7 Prior … 0
#> 8 1 18018 18018 10 No obs… 0
#> 9 1 25886 18018 11 Starti… NA
#> 10 1 24509 18018 12 Exclud… 1377
#> # ℹ 12 more rows
#> # ℹ 19 more variables: excluded_subjects <dbl>, outcome_cohort_id <chr>,
#> # outcome_cohort_name <chr>, analysis_outcome_washout <dbl>,
#> # analysis_repeated_events <lgl>, analysis_interval <chr>,
#> # analysis_complete_database_intervals <lgl>, denominator_cohort_id <int>,
#> # analysis_min_cell_count <dbl>, denominator_cohort_name <chr>,
#> # denominator_age_group <chr>, denominator_sex <chr>, …
As with incidence, if we set returnParticipants as TRUE, we can identify the individuals who contributed to the prevalence rate analysis by using `participants(). For example, we can identify those people contributing to analysis 1 by running
participants(inc, analysisId = 1) %>%
glimpse()
#> Rows: ??
#> Columns: 4
#> Database: DuckDB 0.8.1 [eburn@Windows 10 x64:R 4.2.1/:memory:]
#> $ subject_id <chr> "6", "12", "13", "19", "21", "22", "29", "40", "42"…
#> $ cohort_start_date <date> 2011-10-13, 2008-01-01, 2008-01-01, 2008-03-08, 20…
#> $ cohort_end_date <date> 2012-01-01, 2009-03-08, 2009-12-29, 2009-03-10, 20…
#> $ outcome_start_date <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
As we;ve used permanent tables for this example, we can drop these after running our analysis.
::listTables(attr(cdm, "dbcon"))
CDMConnector#> [1] "cdm_source" "denominator" "denominator_attrition"
#> [4] "denominator_count" "denominator_set" "inc_participants1"
#> [7] "observation_period" "outcome" "outcome_count"
#> [10] "outcome_set" "person" "strata"
#> [13] "strata_count" "strata_set" "vocabulary"
::dropTable(cdm = cdm, name = starts_with("denominator"))
CDMConnector::dropTable(cdm = cdm, name = starts_with("inc_participants"))
CDMConnector::listTables(attr(cdm, "dbcon"))
CDMConnector#> [1] "cdm_source" "observation_period" "outcome"
#> [4] "outcome_count" "outcome_set" "person"
#> [7] "strata" "strata_count" "strata_set"
#> [10] "vocabulary"