---
title: "Cohort diagnostics"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{a03_CohortDiagnostics}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>", message = FALSE, warning = FALSE,
  fig.width = 7
)

library(CDMConnector)
if (Sys.getenv("EUNOMIA_DATA_FOLDER") == "") Sys.setenv("EUNOMIA_DATA_FOLDER" = tempdir())
if (!dir.exists(Sys.getenv("EUNOMIA_DATA_FOLDER"))) dir.create(Sys.getenv("EUNOMIA_DATA_FOLDER"))
if (!eunomia_is_available()) downloadEunomiaData()
```

## Introduction
In this example we're going to summarise cohort diagnostics results for cohorts of individuals with an ankle sprain, ankle fracture, forearm fracture, or a hip fracture using the Eunomia synthetic data. 

Again, we'll begin by creating our study cohorts.

```{r}
library(CDMConnector)
library(CohortConstructor)
library(CodelistGenerator)
library(PatientProfiles)
library(CohortCharacteristics)
library(PhenotypeR)
library(dplyr)
library(ggplot2)

con <- DBI::dbConnect(duckdb::duckdb(),
  dbdir = CDMConnector::eunomia_dir()
)
cdm <- CDMConnector::cdm_from_con(con,
  cdm_schem = "main",
  write_schema = "main",
  cdm_name = "Eunomia"
)

cdm$injuries <- conceptCohort(cdm = cdm,
  conceptSet = list(
    "ankle_sprain" = 81151,
    "ankle_fracture" = 4059173,
    "forearm_fracture" = 4278672,
    "hip_fracture" = 4230399
  ),
  name = "injuries")
```

## Cohort diagnostics

We can run cohort diagnostics analyses for each of our overall cohorts like so:
```{r}
cohort_diag <- cohortDiagnostics(cdm$injuries)
```

Our results will include a summary of the overlap between our cohorts. We could visualise this 
```{r}
plotCohortOverlap(cohort_diag, uniqueCombinations = TRUE)
```

Moreover, our results will also include a summary of the characteristics of each cohort, stratified by age group and sex.
```{r}
tableCharacteristics(cohort_diag, groupColumn = c("age_group", "sex"))
```

You can also visualise the age distribution:
```{r}
tableCharacteristics(cohort_diag, groupColumn = c("age_group", "sex"))
```