--- title: "Adding p-values" output: rmarkdown::html_vignette: fig_width: 7 vignette: > %\VignetteIndexEntry{Adding p-values} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r} library(ggplot2) library(ggprism) library(patchwork) library(magrittr) ``` Arguably one of the most popular features of GraphPad Prism is adding p-values to plots. Indeed in [Prism 9](https://www.graphpad.com/updates/prism-900-release-notes), GraphPad have added a feature to automatically perform pairwise comparisons and add the resulting p-values with brackets to the graph. `ggprism` includes the `add_pvalue()` function to add p-values with or without brackets to ggplots. This vignette will go through the many ways in which this function can be used. This function is a re-written version of `stat_pvalue_manual()` from the [`ggpubr` package](https://github.com/kassambara/ggpubr), which itself is based on the `geom_signif()` function from the [`ggsignif` package](https://github.com/const-ae/ggsignif). Compared to `stat_pvalue_manual()`, the `add_pvalue()` function is: easier to use, more robust with less dependencies, and has more customisable brackets. ## Basic use (brackets) To add significance brackets to a plot, you need a minimal data.frame with 4 columns and a number of rows corresponding to the number of brackets you want to add. The 4 columns should correspond to these 4 function arguments: 1. xmin, the left hand position of the bracket (default column name is `"group1"`) 1. xmax, the right hand position of the bracket (default column name is `"group2"`) 1. label, the text to go on the brackets (default column name is `"label"`) 1. y.position, the vertical position of the bracket (default column name is `"y.position"`) For grouped or faceted data you'll also need a column which is named according to the grouping variable. See the **Many more examples** section for help with this/examples. Let's see how this works in practice. First we'll plot the `sleep` data set. ```{r} str(sleep) ``` ```{r, fig.width=4, fig.height=3.5} # create a jitter plot of the sleep data set # and indicate the means p <- ggplot(sleep, aes(x = group, y = extra)) + geom_jitter(aes(shape = group), width = 0.1) + stat_summary(geom = "crossbar", fun = mean, colour = "red", width = 0.2) + theme_prism() + theme(legend.position = "none") p ``` Next we'll perform a t-test and obtain a p-value for the difference between the two means. ```{r} # perform a t-test and obtain the p-value result <- t.test(extra ~ group, data = sleep)$p.value result <- signif(result, digits = 3) result ``` Now we'll construct a p-value data.frame for `add_pvalue()` to use. ```{r} df_p_val <- data.frame( group1 = "1", group2 = "2", label = result, y.position = 6 ) ``` And finally we'll add this p-value to our plot. Because we have used the default column names (see above) in our p-value table we don't necessarily have to specify any arguments of `add_pvalue()`. However, here we'll do it for clarity's sake. Additionally, if your p-value table has special column names, you will need to specify them in `add_pvalue()`. ```{r, fig.height=3.5} # add p-value brackets p1 <- p + add_pvalue(df_p_val, xmin = "group1", xmax = "group2", label = "label", y.position = "y.position") # change column names to something silly colnames(df_p_val) <- c("apple", "banana", "some_label", "some_y_position") # add p-value brackets again p2 <- p + add_pvalue(df_p_val, xmin = "apple", xmax = "banana", label = "some_label", y.position = "some_y_position") p1 + p2 ``` ```{r} # return column names back to default colnames(df_p_val) <- c("group1", "group2", "label", "y.position") ``` You can easily change how the bracket and label looks. You can make the label a `glue` expression. You can also change the tip length of the bracket. Lastly, you can flip the label when using `coord_flip()`. ```{r, fig.width=7, fig.height=7} # change bracket and label aesthetics p1 <- p + add_pvalue(df_p_val, colour = "red", # label label.size = 8, # label fontface = "bold", # label fontfamily = "serif", # label angle = 45, # label hjust = 1, # label vjust = 2, # label bracket.colour = "blue", # bracket bracket.size = 1, # bracket linetype = "dashed", # bracket lineend = "round") # bracket # use glue expression for label p2 <- p + add_pvalue(df_p_val, label = "p = {label}") # make bracket tips longer and use coord_flip p3 <- p + add_pvalue(df_p_val, tip.length = 0.15, coord.flip = TRUE) + coord_flip() # change bracket tips independently # (make one side disappear and the other longer) p4 <- p + add_pvalue(df_p_val, tip.length = c(0.2, 0)) (p1 + p2) / (p3 + p4) ``` ## Basic use (no brackets) Even if you don't want brackets, `add_pvalue()` is also useful for adding significance text to plots with the correct/automatic positioning. In the example above, if you wanted the text but not the bracket, you can just use the `remove.bracket` argument. In this case, you must use the `x` argument to change the x position of the text. ```{r, fig.height=3.5} # position label above "group1" p1 <- p + add_pvalue(df_p_val, label = "p = {label}", remove.bracket = TRUE, x = 1) # position label between x = 1 and x = 2 p2 <- p + add_pvalue(df_p_val, label = "p = {label}", remove.bracket = TRUE, x = 1.5) p1 + p2 ``` Here is another example of 'text only' plot using the `ToothGrowth` data set. ```{r} str(ToothGrowth) ``` ```{r, fig.width=4, fig.height=3.5} # create a box plot of the ToothGrowth data set p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) + geom_boxplot(aes(fill = dose), colour = "black") + theme_prism() + theme(legend.position = "none") p ``` Next we'll perform two t-tests and compare the means against `dose = 0.5` as a reference group. Then we'll correct the p-values for multiple testing. ```{r} # compare means again reference result1 <- t.test(len ~ dose, data = subset(ToothGrowth, dose %in% c(0.5, 1.0)))$p.value result2 <- t.test(len ~ dose, data = subset(ToothGrowth, dose %in% c(0.5, 2.0)))$p.value # Benjamini-Hochberg correction for multiple testing result <- p.adjust(c(result1, result2), method = "BH") ``` We can now construct a p-value table. Note that in this case we don't need to to specify a `"group2"` column for xmax. This is because text-only p-value annotations just have an x position (`x`) and not an x range (`xmin` and `xmax`). ```{r} # don't need group2 column (i.e. xmax) # instead just specify x position in the same way as y.position df_p_val <- data.frame( group1 = c(0.5, 0.5), group2 = c(1, 2), x = c(2, 3), label = signif(result, digits = 3), y.position = c(35, 35) ) ``` Then we add the p-values to the plot. As before, you can change how the labels look quite easily. ```{r, fig.height=3.5} p1 <- p + add_pvalue(df_p_val, xmin = "group1", x = "x", label = "label", y.position = "y.position") p2 <- p + add_pvalue(df_p_val, xmin = "group1", x = "x", label = "p = {label}", y.position = "y.position", label.size = 3.2, fontface = "bold") p1 + p2 ``` If you want the label number format to look nicer you can provide a column name to `label` with _plotmath_ expressions and set `parse = TRUE`. This works with or without brackets. ```{r, fig.height=3.5} # plotmath expression to have superscript exponent df_p_val$p.exprs <- paste0("P==1*x*10^", round(log10(df_p_val$label), 0)) # as above but with italics df_p_val$p.exprs.ital <- lapply( paste(round(log10(df_p_val$label), 0)), function(x) bquote(italic("P = 1x10"^.(x))) ) p1 <- p + add_pvalue(df_p_val, xmin = "group1", x = "x", label = "p.exprs", y.position = "y.position", parse = TRUE) p2 <- p + add_pvalue(df_p_val, xmin = "group1", x = "x", label = "p.exprs.ital", y.position = "y.position", parse = TRUE) p1 + p2 ``` ## Using the `rstatix` package As `add_pvalue()` is ultimately just a rewritten version of `stat_pvalue_manual()`, it works well with the [`rstatix` package](https://github.com/kassambara/rstatix). With [`rstatix`](https://github.com/kassambara/rstatix), you can perform the statistical test and create the p-value table with the appropriate x and y position automatically, in a single step. ```{r, fig.width=4, fig.height=3.5} df_p_val <- rstatix::t_test(ToothGrowth, len ~ dose, ref.group = "0.5") %>% rstatix::add_xy_position() p + add_pvalue(df_p_val, label = "p = {p.adj}", remove.bracket = TRUE) ``` ## Many more examples Here we will use `add_pvalue()` and [`rstatix`](https://github.com/kassambara/rstatix) to show many more examples of how to add p-values to different plots. ### Compare two means Compare mean `len` depending on `supp`. Error bars indicate 1 standard deviation from the mean. ```{r, fig.width=4, fig.height=3.5} df_p_val <- rstatix::t_test(ToothGrowth, len ~ supp) %>% rstatix::add_x_position() p <- ggplot(ToothGrowth, aes(x = factor(supp), y = len)) + stat_summary(geom = "col", fun = mean) + stat_summary(geom = "errorbar", fun = mean, fun.min = function(x) mean(x) - sd(x), fun.max = function(x) mean(x) + sd(x), width = 0.3) + theme_prism() + coord_cartesian(ylim = c(0, 35)) + scale_y_continuous(breaks = seq(0, 35, 5), expand = c(0, 0)) # normal plot p + add_pvalue(df_p_val, y.position = 30) ``` ### Compare mean vs reference Compare mean `len` of each dose to `dose = 0.5`. Error bars indicate 1 standard deviation from the mean. ```{r, fig.height=3.5} df_p_val <- rstatix::t_test(ToothGrowth, len ~ dose, ref.group = "0.5") %>% rstatix::add_xy_position() p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) + stat_summary(geom = "col", fun = mean) + stat_summary(geom = "errorbar", fun = mean, fun.min = function(x) mean(x) - sd(x), fun.max = function(x) mean(x) + sd(x), width = 0.3) + theme_prism() + coord_cartesian(ylim = c(0, 40)) + scale_y_continuous(breaks = seq(0, 40, 5), expand = c(0, 0)) # with brackets p1 <- p + add_pvalue(df_p_val, label = "p.adj.signif") # without brackets p2 <- p + add_pvalue(df_p_val, label = "p.adj.signif", remove.bracket = TRUE) p1 + p2 ``` Now, compare overall mean `len` (base mean) to the mean `len` for each `dose`. Error bars indicate 1 standard deviation from the mean. ```{r, fig.width=4, fig.height=3.5} df_p_val <- rstatix::t_test(ToothGrowth, len ~ dose, ref.group = "all") p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) + stat_summary(geom = "col", fun = mean) + stat_summary(geom = "errorbar", fun = mean, fun.min = function(x) mean(x) - sd(x), fun.max = function(x) mean(x) + sd(x), width = 0.3) + theme_prism() + coord_cartesian(ylim = c(0, 40)) + scale_y_continuous(breaks = seq(0, 40, 5), expand = c(0, 0)) p + add_pvalue(df_p_val, label = "p.adj.signif", y.position = 35) ``` Now, compare the mean `len` for each `dose` to some arbitrary value, say `26` in this case. Error bars indicate 1 standard deviation from the mean. ```{r, fig.width=4, fig.height=3.5} df_p_val <- ToothGrowth %>% rstatix::group_by(factor(dose)) %>% rstatix::t_test(len ~ 1, mu = 26) %>% rstatix::adjust_pvalue(p.col = "p", method = "holm") %>% rstatix::add_significance(p.col = "p.adj") p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) + stat_summary(geom = "col", fun = mean) + stat_summary(geom = "errorbar", fun = mean, fun.min = function(x) mean(x) - sd(x), fun.max = function(x) mean(x) + sd(x), width = 0.3) + theme_prism() + coord_cartesian(ylim = c(0, 40)) + scale_y_continuous(breaks = seq(0, 40, 5), expand = c(0, 0)) # remember xmin and x are referring to the column dames in df_p_val p + add_pvalue(df_p_val, xmin = "group1", x = "factor(dose)", y = 37, label = "p.adj.signif") ``` ### Multiple pairwise comparisons Compare mean `len` across the 3 different `dose`. Use the `bracket.shorten` argument to slightly shorten side-by-side brackets. ```{r, fig.width=4, fig.height=3.5} df_p_val <- rstatix::t_test(ToothGrowth, len ~ dose) p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) + geom_violin(trim = FALSE) + geom_boxplot(width = 0.2) + theme_prism() + coord_cartesian(ylim = c(0, 45)) + scale_y_continuous(breaks = seq(0, 45, 5), expand = c(0, 0)) p + add_pvalue(df_p_val, y.position = c(44, 41, 44), bracket.shorten = c(0.025, 0, 0.025)) ``` ### Grouped pairwise comparisons Pairwise comparisons _between groups_ of the `ToothGrowth` data set, grouped according to `supp`. The boxplots and the brackets are automatically coloured according to `supp`. Three important points for this graph: 1. The p-value data.frame **must** have a column named with the grouping variable (in this case named `supp`) and the column must contain the groups to group by (in this case `"OJ" or "VC"`. 1. Any grouping of data **must** be done in the individual Geom functions (in this case `geom_boxplot()`) and **not** in the `ggplot()` function. 1. You can use the `step.group.by = "supp"` to automatically change the bracket spacing between different groups. ```{r, fig.width=5, fig.height=3.5} df_p_val <- ToothGrowth %>% rstatix::group_by(supp) %>% rstatix::t_test(len ~ dose) %>% rstatix::add_xy_position() p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) + geom_boxplot(aes(fill = supp)) + theme_prism() # remember colour and step.group.by are referring to a column name in df_p_val p + add_pvalue(df_p_val, label = "p = {p.adj}", colour = "supp", fontface = "bold", step.group.by = "supp", step.increase = 0.1, tip.length = 0, bracket.colour = "black", show.legend = FALSE) ``` Pairwise comparisons _within groups_ of the `ToothGrowth` data set, grouped according to `supp`. ```{r, fig.width=5, fig.height=3.5} df_p_val <- ToothGrowth %>% rstatix::group_by(dose) %>% rstatix::t_test(len ~ supp) %>% rstatix::adjust_pvalue(p.col = "p", method = "bonferroni") %>% rstatix::add_significance(p.col = "p.adj") %>% rstatix::add_xy_position(x = "dose", dodge = 0.8) # important for positioning! p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) + geom_boxplot(aes(fill = supp)) + theme_prism() + coord_cartesian(ylim = c(0, 40)) p + add_pvalue(df_p_val, xmin = "xmin", xmax = "xmax", label = "p = {p.adj}", tip.length = 0) ``` Pairwise comparisons _within groups_ and _between groups_ of the `ToothGrowth` data set, grouped according to `supp`. You can use `bracket.nudge.y` to slightly adjust the overall y position of the brackets instead of having to redefine `df_p_val2`. ```{r, fig.width=5, fig.height=4} df_p_val1 <- ToothGrowth %>% rstatix::group_by(dose) %>% rstatix::t_test(len ~ supp) %>% rstatix::adjust_pvalue(p.col = "p", method = "bonferroni") %>% rstatix::add_significance(p.col = "p.adj") %>% rstatix::add_xy_position(x = "dose", dodge = 0.8) # important for positioning! df_p_val2 <- rstatix::t_test(ToothGrowth, len ~ dose, p.adjust.method = "bonferroni") %>% rstatix::add_xy_position() p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) + geom_boxplot(aes(fill = supp)) + theme_prism() + coord_cartesian(ylim = c(0, 45)) p + add_pvalue(df_p_val1, xmin = "xmin", xmax = "xmax", label = "p = {p.adj}", tip.length = 0) + add_pvalue(df_p_val2, label = "p = {p.adj}", tip.length = 0.01, bracket.nudge.y = 2, step.increase = 0.015) ``` ### Facets Facet according to `dose` and then compare mean `len` between either `supp`. It is important that the p-value table **must** have a column with the same name as the faceting variable (in this case `"dose"`). ```{r, fig.height=3.5} df_p_val <- ToothGrowth %>% rstatix::group_by(dose) %>% rstatix::t_test(len ~ supp) %>% rstatix::add_xy_position() p <- ggplot(ToothGrowth, aes(x = factor(supp), y = len)) + geom_boxplot(width = 0.2) + facet_wrap( ~ dose, scales = "free", labeller = labeller(dose = function(x) paste("dose =", x)) ) + theme_prism() p + add_pvalue(df_p_val) ``` Facet according to `supp` and then compare mean `len` between the three `dose`. It is important that the p-value table **must** have a column with the same name as the faceting variable (in this case `"supp"`). ```{r, fig.height=3.5} df_p_val <- ToothGrowth %>% rstatix::group_by(supp) %>% rstatix::t_test(len ~ dose) %>% rstatix::add_xy_position() p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) + geom_boxplot(width = 0.4) + facet_wrap(~ supp, scales = "free") + theme_prism() p + add_pvalue(df_p_val) ``` Facet according to some particular grouping variable called `grp` and `dose`, and then compare mean `len` between either `supp`. It is important that the p-value table **must** have columns with the same names as the two faceting variables (in this case `"grp"` and `"dose"`). ```{r, fig.height=7} # add a grouping variable to ToothGrowth tg <- ToothGrowth tg$dose <- factor(tg$dose) tg$grp <- factor(rep(c("grp1", "grp2"), 30)) # construct the p-value table by hand df_p_val <- data.frame( group1 = c("OJ", "OJ"), group2 = c("VC", "VC"), p.adj = c(0.0449, 0.00265), y.position = c(22, 27), grp = c("grp1", "grp2"), dose = c("0.5", "1") ) p <- ggplot(tg, aes(x = factor(supp), y = len)) + geom_boxplot(width = 0.4) + facet_wrap(grp ~ dose, scales = "free") + theme_prism() p + add_pvalue(df_p_val, bracket.nudge.y = 3) ```