--- title: "SPARRAfairness example" author: "James Liley, Ioanna Thoma" date: "`r Sys.Date()`" output: rmarkdown::html_vignette bibliography: SPARRAfairness.bib vignette: > %\VignetteIndexEntry{SPARRAfairness example} %\VignetteEngine{knitr::knitr} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction In this vignette, we will demonstrate how to calculate and plot a range of performance metrics for a clinical risk score across demographic groups. We will use both simulated and real data. Please see our manuscript for further details. Our main risk score of interest is the SPARRA score [@sparrav3], which predicts emergency hospital admission in the coming year given data from electronic health records for the majority of the population of Scotland. We analyse the currently-used SPARRA version 3 score but this package and vignette also allows for analysis of the in-development SPARRA version 4 score [@sparrav4]. ## Simulation of data Raw data for the SPARRA version 3 score comprises individual-level medical records and is private. In order to demonstrate computation of performance metrics, we will begin by simulating semi-realistic data for 10,000 individuals. ```{r,echo=TRUE} # Load packages library(SPARRAfairness) library(ranger) # Get data data(all_data) data(decomposition_matrix) # Set random seed seed=463825 set.seed(seed) # Simulate data pop_data=sim_pop_data(10000) # First few rows head(pop_data) ``` ## Fit risk score We will fit a risk score to this data using a random forest, using the `ranger` package with default settings (500 trees). We will use scores generated from a model fitted directly to our simulated data, which will be somewhat overfitted, but for our purposes will be adequate. ```{r,echo=TRUE} # Fit model sim_model=ranger( target~age+sexM+raceNW+prevAdm+SIMD+urban_rural+mainland_island, data=pop_data) # Model predictions score=predict(sim_model,pop_data)$predictions # Affix to pop_data pop_data$score=score ``` ## Define groups We will define groups based on urban/rural postcode: 1 will be urban, 2 rural ```{r,echo=TRUE} group1=which(pop_data$urban_rural==FALSE) group2=which(pop_data$urban_rural==TRUE) ``` ## Generate metrics We will look at threshold score values evenly spaced between 0 and 1. ```{r,echo=TRUE} score_cutoffs=seq(0,1,length=101) ``` ### Score distributions We begin with distribution of scores, or demographic parity [@calders09,@zliobaite15]: ```{r,echo=TRUE} dem_par=demographic_parity(pop_data$score,group1,group2,cutoffs=score_cutoffs) ``` ### Counterfactual score distributions We now consider the distribution of scores of a counterfactual set of individuals, distributed as though they were urban residents, but are rural. We take age and sex as downstream of urban/rural status. Essentially we isolate causes of differential score distributions to those attributable to age and sex. We do this by generating a 'counterfactual sample' from our data. Please see the examples for `counterfactual_yhat` for a more in-depth look at what we mean by counterfactuals and what this funtion does. ```{r,echo=TRUE} # Counterfactual sample (samples are rural, but 'resemble' urban samples) cf_rural_ids=counterfactual_yhat(dat=pop_data,X=c("age","sexM"), x=NULL,G="urban_rural", g=FALSE,gdash=TRUE, excl=c("id","score","target")) # Put together into data frame cf_pop_data=rbind(pop_data[group1,],pop_data[cf_rural_ids,]) cf_group1=which(cf_pop_data$urban_rural==FALSE); cf_group2=which(cf_pop_data$urban_rural==TRUE) cf_dem_par=demographic_parity(cf_pop_data$score,cf_group1, cf_group2,cutoffs=score_cutoffs) ``` ### Score performance We then look at performance in each group, first looking at ROC curves. In generating ROC and PRC curves, the functions `getroc` and `getprc` return 100 equally-spaced points along the curve, rather than the full curve which would have 10,000 points. ```{r,echo=TRUE} roc1=getroc(pop_data$target[group1],pop_data$score[group1]) roc2=getroc(pop_data$target[group2],pop_data$score[group2]) ``` then PR curves ```{r,echo=TRUE} prc1=getprc(pop_data$target[group1],pop_data$score[group1]) prc2=getprc(pop_data$target[group2],pop_data$score[group2]) ``` then calibration curves [@brocker07]: ```{r,echo=TRUE} cal1=getcal(pop_data$target[group1],pop_data$score[group1]) cal2=getcal(pop_data$target[group2],pop_data$score[group2]) ``` ### False omission and false discovery rates We assess false omission rate. Where $Y$ is target, $\hat{Y}$ is score, $G$ is group, this is $P(Y=1|\hat{Y} \leq c,G=g)$, considered across groups $g$ and cutoffs $c$. We compute this by defining a specifications vector (see documentation of function `group_fairness`): ```{r,echo=TRUE} spec_for=c(NA,1,NA,0,NA,1) for_12=group_fairness(spec_for,pop_data$score,pop_data$target, group1,group2,cutoffs = score_cutoffs) ``` We then assess false discovery rate: $P(Y=0|\hat{Y} > c,G=g)$, considered across groups $g$ and cutoffs $c$. ```{r,echo=TRUE} spec_fdr=c(NA,0,NA,1,NA,1) fdr_12=group_fairness(spec_fdr,pop_data$score,pop_data$target, group1,group2,cutoffs = score_cutoffs) ``` Since false discovery and false positive rates might be influenced by different distributions of age and sex between groups, we adjust for these. We define a categorisation by age and sex ```{r,echo=TRUE} cat_12=paste0(pop_data$age,"_",pop_data$sexM) for_adjusted_12=adjusted_for(pop_data$score,pop_data$target,cat_12, group1,group2,cutoffs=score_cutoffs,nboot=10) fdr_adjusted_12=adjusted_fdr(pop_data$score,pop_data$target,cat_12, group1,group2,cutoffs=score_cutoffs,nboot=10) ``` ## Plotting data We now plot the output of these computations. We also plot similar figures for real data. We set some general settings ```{r,echo=TRUE} alpha=0.05 # We will plot 1-alpha confidence intervals q=-qnorm(alpha/2) # Multiply standard errors by this to get half CI width highlight_value=0.5 # Highlight what happens at a cutoff of 0.5/50% colour_scheme <- phs_colours(c("phs-blue", "phs-purple", "phs-magenta")) # set colour scheme ``` ### Score distributions We plot demographic parity on a log scale. Note score is given as a percentage. ```{r,echo=TRUE} obj_list=list( list(x=score_cutoffs,y=dem_par[1,],ci=q*dem_par[2,]), list(x=score_cutoffs,y=dem_par[3,],ci=q*dem_par[4,]) ) groupmetric_2panel(obj_list, labels=c("Urban","Rural"), col=phs_colours(c("phs-blue","phs-magenta")), ci_col=phs_colours(c("phs-blue","phs-magenta")), yrange=c(0,1), lpos="topleft", yrange_lower=c(-0.2,0.2), highlight=highlight_value, logscale=TRUE) ``` An equivalent plot for real data: ```{r,echo=TRUE} obj=all_data$dp_v3_Urban_rural_all obj_list=list( list(x=exp(obj$xx),y=obj$yA,ci=obj$cA), list(x=exp(obj$xx),y=obj$yB,ci=obj$cB) ) groupmetric_2panel(obj_list, labels=c("Urban","Rural"), col=phs_colours(c("phs-blue","phs-magenta")), ci_col=phs_colours(c("phs-blue","phs-magenta")), yrange=c(0,1), lpos="topleft", yrange_lower=c(-0.2,0.2), highlight=highlight_value, logscale=TRUE) ``` ### Counterfactual score distributions We plot counterfactual score distributions in a similar way ```{r,echo=TRUE} obj_list=list( list(x=score_cutoffs,y=cf_dem_par[1,],ci=q*cf_dem_par[2,]), list(x=score_cutoffs,y=cf_dem_par[3,],ci=q*cf_dem_par[4,]) ) groupmetric_2panel(obj_list, labels=c("Urban","Rural"), col=phs_colours(c("phs-blue","phs-magenta")), ci_col=phs_colours(c("phs-blue","phs-magenta")), yrange=c(0,1), lpos="topleft", yrange_lower=c(-0.2,0.2), highlight=highlight_value, logscale=TRUE) ``` An equivalent plot for real data: ```{r,echo=TRUE} obj=all_data$counterfactual_dp_v3_Urban_rural obj_list=list( list(x=exp(obj$xx),y=obj$yA,ci=obj$cA), list(x=exp(obj$xx),y=obj$yB,ci=obj$cB) ) groupmetric_2panel(obj_list, labels=c("Urban","Rural"), col=phs_colours(c("phs-blue","phs-magenta")), ci_col=phs_colours(c("phs-blue","phs-magenta")), yrange=c(0,1), lpos="topleft", yrange_lower=c(-0.2,0.2), highlight=highlight_value, logscale=TRUE) ``` ### Score performance We plot ROC curves, PR curves and calibration curves for both groups ```{r,echo=TRUE} roc_2panel(list(roc1,roc2), labels = c("Urban","Rural"), col=colour_scheme, highlight=highlight_value, yrange_lower=c(-0.2,0.2)) ``` For real data: ```{r,echo=TRUE} roc_2panel(all_data$roc_v3_Urban_rural[1:3], labels = c("Overall","Urban","Rural"), col=colour_scheme, highlight=highlight_value, yrange_lower=c(-0.2,0.2)) ``` We now plot PR curves: ```{r,echo=TRUE} prc_2panel(list(prc1,prc2), labels = c("Urban","Rural"), col=colour_scheme, highlight=highlight_value, yrange_lower=c(-0.2,0.2)) ``` For real data: ```{r,echo=TRUE} prc_2panel(all_data$prc_v3_Urban_rural[1:3], labels = c("Overall","Urban","Rural"), col=colour_scheme, highlight=highlight_value, yrange_lower=c(-0.2,0.2)) ``` Finally, we plot calibration curves (reliability diagrams): ```{r,echo=TRUE} cal_2panel(list(cal1,cal2), labels = c("Urban","Rural"), col=colour_scheme, highlight=highlight_value, yrange_lower=c(-0.2,0.2)) ``` For real data: ```{r,echo=TRUE} cal_2panel(all_data$cal_v3_Urban_rural[1:3], labels = c("Overall","Urban","Rural"), col=colour_scheme, highlight=highlight_value, yrange_lower=c(-0.2,0.2)) ``` ### False omission and false discovery rates We plot raw false omission rates as follows: ```{r,echo=TRUE} obj_list=list( list(x=score_cutoffs,y=for_12[1,],ci=for_12[2,]), list(x=score_cutoffs,y=for_12[3,],ci=for_12[4,]) ) groupmetric_2panel(obj_list, labels=c("Urban","Rural"), col=phs_colours(c("phs-blue","phs-magenta")), ci_col=phs_colours(c("phs-blue","phs-magenta")), yrange=c(0,0.5), lpos="topleft", yrange_lower=c(-0.1,0.1), highlight=highlight_value) ``` and false discovery rates as follows: ```{r,echo=TRUE} obj_list=list( list(x=score_cutoffs,y=fdr_12[1,],ci=fdr_12[2,]), list(x=score_cutoffs,y=fdr_12[3,],ci=fdr_12[4,]) ) groupmetric_2panel(obj_list, labels=c("Urban","Rural"), col=phs_colours(c("phs-blue","phs-magenta")), ci_col=phs_colours(c("phs-blue","phs-magenta")), yrange=c(0,1), lpos="topleft", yrange_lower=c(-0.1,0.1), highlight=highlight_value) ``` For real data, we plot false omission rates as follows: ```{r,echo=TRUE} obj=all_data$forp_v3_Urban_rural_all obj_list=list( list(x=obj$cutoffs,y=obj$p_AA,ci=obj$ci_AA), list(x=obj$cutoffs,y=obj$p_BB,ci=obj$ci_BB) ) groupmetric_2panel(obj_list, labels=c("Urban","Rural"), col=phs_colours(c("phs-blue","phs-magenta")), ci_col=phs_colours(c("phs-blue","phs-magenta")), yrange=c(0,0.1), lpos="topleft", yrange_lower=c(-0.02,0.02), highlight=highlight_value) ``` and false discovery rates as follows: ```{r,echo=TRUE} obj=all_data$fdrp_v3_Urban_rural_all obj_list=list( list(x=obj$cutoffs,y=obj$p_AA,ci=obj$ci_AA), list(x=obj$cutoffs,y=obj$p_BB,ci=obj$ci_BB) ) groupmetric_2panel(obj_list, labels=c("Urban","Rural"), col=phs_colours(c("phs-blue","phs-magenta")), ci_col=phs_colours(c("phs-blue","phs-magenta")), yrange=c(0,1), lpos="topright", yrange_lower=c(-0.1,0.1), highlight=highlight_value) ``` We plot adjusted false omission rates as follows: ```{r,echo=TRUE} obj_list=list( list(x=score_cutoffs,y=for_adjusted_12[1,],ci=for_adjusted_12[2,]), list(x=score_cutoffs,y=for_adjusted_12[3,],ci=for_adjusted_12[4,]) ) groupmetric_2panel(obj_list, labels=c("Urban","Rural"), col=phs_colours(c("phs-blue","phs-magenta")), ci_col=phs_colours(c("phs-blue","phs-magenta")), yrange=c(0,0.5), lpos="topleft", yrange_lower=c(-0.1,0.1), highlight=highlight_value) ``` and false discovery rates as follows: ```{r,echo=TRUE} obj_list=list( list(x=score_cutoffs,y=fdr_adjusted_12[1,],ci=fdr_adjusted_12[2,]), list(x=score_cutoffs,y=fdr_adjusted_12[3,],ci=fdr_adjusted_12[4,]) ) groupmetric_2panel(obj_list, labels=c("Urban","Rural"), col=phs_colours(c("phs-blue","phs-magenta")), ci_col=phs_colours(c("phs-blue","phs-magenta")), yrange=c(0,1), lpos="topleft", yrange_lower=c(-0.1,0.1), highlight=highlight_value) ``` For real data, we plot false omission rates as follows: ```{r,echo=TRUE} obj=all_data$forp_adjusted_v3_Urban_rural obj_list=list( list(x=obj$cutoffs,y=obj$p_AA,ci=obj$ci_AA), list(x=obj$cutoffs,y=obj$p_BB,ci=obj$ci_BB) ) groupmetric_2panel(obj_list, labels=c("Urban","Rural"), col=phs_colours(c("phs-blue","phs-magenta")), ci_col=phs_colours(c("phs-blue","phs-magenta")), yrange=c(0,0.1), lpos="topleft", yrange_lower=c(-0.02,0.02), highlight=highlight_value) ``` and false discovery rates as follows: ```{r,echo=TRUE} obj=all_data$fdrp_adjusted_v3_Urban_rural obj_list=list( list(x=obj$cutoffs,y=obj$p_AA,ci=obj$ci_AA), list(x=obj$cutoffs,y=obj$p_BB,ci=obj$ci_BB) ) groupmetric_2panel(obj_list, labels=c("Urban","Rural"), col=phs_colours(c("phs-blue","phs-magenta")), ci_col=phs_colours(c("phs-blue","phs-magenta")), yrange=c(0,1), lpos="topright", yrange_lower=c(-0.1,0.1), highlight=highlight_value) ``` ## Decompose FOR by admission type Amongst individuals who have had a hospital admission, we may wish to probe the distribution of *reasons* for admission. This distribution of reasons may be expected to change across demographic groups. We will use a score threshold of 50% or 0.5. We begin by decomposing admissions by type and score quantile. See documentation for `plot_decomp()` for details of what we produce. ```{r,echo=TRUE,fig.width=8,fig.height=6} cutoff=0.5 # 50% risk score threshold decomp_matrices=dat2mat(pop_data, score=pop_data$score, group1=which(pop_data$urban_rural==0), group2=which(pop_data$urban_rural==1), nquant=20) plot_decomp(decomp_matrices$matrix1, decomp_matrices$matrix2, threshold=cutoff, labels=c("Urban","Rural")) ``` We now plot the corresponding figure for real data: ```{r,echo=TRUE,fig.width=8,fig.height=6} cutoff=0.1 # 10% risk score threshold names_group1=paste0("v3_Urban_q",1:20) names_group2=paste0("v3_Rural_q",1:20) decomp1=decomposition_matrix[names_group1,] decomp2=decomposition_matrix[names_group2,] plot_decomp(decomp1, decomp2, threshold=cutoff, labels=c("Urban","Rural")) ``` ## Frequencies of admission types amongst false omissions A useful thing to consider is the proportions of each admission type (respiratory, cardiac etc) amongst _individuals with a low SPARRA score_ as compared with the proportion amongst all individuals. To do this, we consider a particular category (e.g., urban residents) and a score threshold and the two groups of individuals: 1. All individuals in that category who had an emergency admission 2. All individuals in that category who had an emergency admission and who had a SPARRA score less than the threshold. Individuals in group 2 are, in a sense, those for whom an emergency admission was unexpected. If a particular admission type is more common in group 2 than in group 1, it indicates that that admission type is relatively harder to predict amongst individuals in that category, and such admissions are less expected. We first plot an example with simulated data: ```{r,echo=TRUE} cutoff=0.5 # 50% risk score threshold decomp_matrices=dat2mat(pop_data, score=pop_data$score, group1=which(pop_data$urban_rural==0), group2=which(pop_data$urban_rural==1), nquant=20) for_breakdown(decomp_matrices$matrix1, group="Urban", threshold=cutoff) ``` We now plot the corresponding figure for real data: ```{r,echo=TRUE} cutoff=0.1 # 10% risk score threshold names_group1=paste0("v3_Urban_q",1:20) decomp1=decomposition_matrix[names_group1,] for_breakdown(decomp1, group="Urban", threshold=cutoff) ``` ## Plot a diagram Finally, if we so wish, we may illustrate a proportion as a group of people of various colours. We may plot an individual human figure as follows: ```{r,echo=TRUE} plot(0,type="n",bty="n",ann=F,xlim=c(-1,1)/2,ylim=c(-1,1)/2) drawperson(0,0,col="blue",border="gray",lwd=3) ``` and illustrate a proportion as follows. Human igures in the 'ci' region are coloured with a gradient between colours. ```{r,echo=TRUE,fig.width=6,fig.height=6} drawprop(0.3,ci=c(0.2,0.4),col1 = "blue",col2="gray") ```     ## References