--- title: "Plotting BRAID Surfaces" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Plotting BRAID Surfaces} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, include = FALSE} library(braidReports) set.seed(20240828) ``` ## Introduction The `ggplot` package is one of the most powerful and widely used tool-kits available for visualizing data in R, and it's easy to understand why. Its well-crafted grammar of layers, aesthetics, scales, and facets allows for extraordinary versatility and power but with remarkable intuitiveness and delightfully tidy code. It is personally my nearly universal go-to for generating visual expressions of data; but when plotting drug combination data, it has some understandable friction points. Consider the following example: ```{r} concentrations <- c(0,2^(-3:3)) surface <- data.frame( concA = rep(rep(concentrations,each=length(concentrations)),each=3), concB = rep(rep(concentrations,times=length(concentrations)),each=3), replicate = rep(c(1,2,3),times=(length(concentrations)^2)) ) surface$actual <- evalBraidModel( surface$concA, surface$concB, c(1, 1, 3, 3, 2, 0, 100, 100, 100) ) surface$measure <- surface$actual + rnorm(nrow(surface),sd=7) head(surface, 12) ``` This synthetic dataset reflects a a fairly typical combination study layout: both drugs have been tested at a range of serially diluted concentrations, and combined doses have been laid out in a a checkerboard so that every combination of concentration of the first drug (including 0) and the second drug (including 0) has been tested. Furthermore, each combined dose is tested in triplicate. The measured response surface has been simulated using a BRAID response surface model with a synergistic $\kappa$ value of 2, but any smoothly varying, noisily measured function of two doses would work just as well. How might we visualize such a response surface using `ggplot2`? A 3-D plot is right out: `ggplot2` has no standard support for such plots, nor should it, as 3-D plots are notoriously poor at conveying accurate quantitative information. A more straightforward approach would be to plot our measured effect as a function of one concentration, conditioned on the level of the other. For example: ```{r, warning=FALSE} ggplot(surface,aes(x=concA,y=measure,colour=factor(concB)))+ geom_point()+ stat_summary(geom="line",fun.data=mean_se)+ scale_x_log10()+ labs(x="Drug A",y="Effect",colour="Drug B") ``` This view is quite effective, and gives a clear sense of how the presence of differing amounts of drug B impact the behavior of drug B. But this plotting approach necessarily differentiates between the two drugs in how they are visualized, and is ineffective at showing the shape of their interaction. The most intuitive way to present combination data is with the use of a response surface heatmap: plot the two doses as the x- and y- dimensions, and visualize their effect using a suitable color scale. Unfortunately, our dataset includes multiple measurements for each combination; were we to plot these using something like `geom_tile()` each replicate would be plotted over the others, so that only the last replicates were visible. `ggplot2` does contain one way to address this out-of-the-box, [stat_summary_2d()], but it doesn't behave exactly as we'd like: ```{r, warning=FALSE} ggplot(surface, aes(x=concA,y=concB))+ stat_summary_2d(aes(z=measure), fun="mean")+ scale_x_log10()+ scale_y_log10()+ scale_fill_distiller(palette="RdYlBu")+ coord_equal()+ labs(x="Drug A",y="Drug B",fill="Effect") ``` Not only are our evenly spaced concentrations pairs reduced to awkwardly rounded mini-tiles, all measurements where either drug is zero have been removed from the plot altogether. This is an intrinsic property of nearly all `ggplot2` stats and geoms: if a transformed coordinate is infinite, it is removed from the plot. Yet plotting serially diluted concentrations on a logarithmic scale is incredibly intuitive, making comparison of non-zero concentrations and zero concentrations frustratingly difficult. It is for these reasons that we developed `geom_braid()`: ```{r, warning=FALSE} ggplot(surface,aes(x=concA,y=concB))+ geom_braid(aes(fill=measure))+ scale_x_log10()+ scale_y_log10()+ scale_fill_distiller(palette="RdYlBu")+ coord_equal()+ labs(x="Drug A",y="Drug B",fill="Effect") ``` Like many `ggplot2` extensions, `geom_braid()` is in reality a `Stat`, performing some useful preprocessing of the data before passing it off to the true geom, `geom_tile()`. Duplicate concentration pairs are identified, and averaged to give the aggregate value at each unique pair. Widths between doses pairs are guessed or provided. Coordinates which, after being transformed, would be infinite are instead offset from the main body of measurements, ensuring they will be plotted along with the data. The result is a simple and intuitive tool for quickly examining combined action data. ## Customizing BRAID Heatmaps As a `ggplot2` extension, `geom_braid()` leverages all of the customization and flexibility `ggplot` objects traditionally afford, including integration with other layers, `ggplot` fill scales, and faceting: ```{r, warning=FALSE} ggplot(surface,aes(x=concA,y=concB))+ geom_braid(aes(fill=measure))+ geom_point(colour="black")+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_viridis_c("Effect",option="A")+ coord_equal()+ facet_wrap(vars(replicate)) ``` Widths and heights of tiles can (and generally should) be generated automatically, but can be passed to the stat as additional aesthetics if desired. Note that widths and heights should be expressed in the *transformed* coordinate space: ```{r, warning=FALSE} surface$tilewidth <- log10(2)*0.9 surface$tilewidth[surface$concA==0] <- log10(2)/2 surface$tileheight <- log10(2)*0.9 surface$tileheight[surface$concB==0] <- log10(2)/2 ggplot(surface,aes(x=concA,y=concB))+ geom_braid(aes(fill=measure,width=tilewidth,height=tileheight),space=3)+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal() ``` ## Stained Glass Surfaces `geom_braid()` is an effective tool for rendering classic checkerboard layouts, but there is no guarantee that data will be laid out so cleanly. Experiments might only measure a subset of a traditional checkerboard, or might select points on an even denser arrangement. For example, the following adjustment simulates an experiment in which measurements from replicates 2 and 3 have increased the concentrations of drug A and drug B respectively. This removes the traditional "triplicate" approach in favor of a more varied, full-coverage method: ```{r} glassSurface <- surface glassSurface$concA[glassSurface$replicate==2] <- glassSurface$concA[glassSurface$replicate==2]*1.25 glassSurface$concB[glassSurface$replicate==3] <- glassSurface$concB[glassSurface$replicate==3]*1.25 glassSurface$actual <- evalBraidModel( glassSurface$concA, glassSurface$concB, c(1, 1, 3, 3, -0.5, 0, 60, 100, 100) ) glassSurface$measure <- glassSurface$actual+rnorm(nrow(glassSurface),sd=7) head(glassSurface, 12) ``` Due to its irregular spacing, plotting `glassSurface` with `geom_braid()` produces a very unsatisfactory plot: ```{r, warning=FALSE} ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid(aes(fill=measure))+ geom_point(colour="black")+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal() ``` While this plot is *technically* correct, it fails to fill the space in the way a response surface should. When dealing with such irregular sampling, a better tool is `geom_braid_glass()` which produces what we call a "stained glass" plot: ```{r, warning=FALSE} ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid_glass(aes(fill=measure))+ geom_point(colour="black")+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal() ``` In a stained glass plot, every point within the bounds of the plotted values is colored according to the value of the measured dose pair nearest to it, producing a mosaic of Voronoi cells that cover the full space. Values in the margins of the plot are given a height or width according the specified or inferred aesthetic, but the boundaries between them are again bisecting Voronoi boundaries. The ability to customize the width of the resulting tiles, particularly in the margins, can be even more valuable in these plots, where the width and height default to the smallest spacing between distinct values: ```{r, warning=FALSE} ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid_glass(aes(fill=measure,width=tilewidth,height=tileheight),space=2)+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal() ``` ## Smoothed BRAID Response Surfaces While the discrete heatmaps of `geom_braid` and `geom_braid_glass` are the most direct ways to visualize combined action data, the harsh polygonal edges introduced can sometimes mask the more important variations in shape and structure. `braidReports` also includes a `ggplot` geom for rendering smoothed surfaces, unsurprisingly named `geom_braid_smooth`: ```{r, warning=FALSE} ggplot(surface,aes(x=concA,y=concB))+ geom_braid_smooth(aes(fill=measure))+ scale_x_log10()+ scale_y_log10()+ scale_fill_distiller(palette="RdYlBu")+ coord_equal()+ labs(x="Drug A",y="Drug B",fill="Effect") ``` `geom_braid_smooth` interpolates a regular grid of values from the original (potentially irregular) data using a two-dimensional Gaussian kernel, producing a smoothed surface that generally hews extremely close to measured values at their respective points, but produces intuitive, smoothly varying values in between them. It can be run on both regularly laid-out checkerboard data and more irregularly spaced measurements, though in the latter case specifying the smoothing width and height explicitly using the `width` and `height` aesthetics is often advisable: ```{r, warning=FALSE} ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid_smooth(aes(fill=measure))+ geom_point(colour="black")+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal() ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid_smooth(aes(fill=measure,width=log10(2),height=log10(2)))+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal() ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid_smooth(aes(fill=measure,width=tilewidth,height=tileheight),space=2)+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal() ``` ## Response Surface Contours Heatmaps are an intuitive and effective way of depicting the results and shape of a combined response surface, but they still fall short when it comes to quantitative depiction. Even the best-designed colormap still carries considerable imprecision, making it difficult to perceive the values and ranges at which particular numerical values are reached. One effective tool for this task is the contour map, which markes the boundaries of dose space at which a given effect level is crossed. To support such plots, we have included `geom_braid_contour()` which uses the same smoothing techniques as `geom_braid_smooth()` to produces an array of x-, y-, and z-values for the built in `ggplot` stat, `stat_contour()`. This allows us to visualize both the overall shape of the surface *and* the boundaries of particular effect spaces: ```{r, warning=FALSE} ggplot(surface,aes(x=concA,y=concB))+ geom_braid_smooth(aes(fill=measure))+ geom_braid_contour(aes(z=measure),breaks=10*(1:9),colour="black",linetype=2)+ scale_x_log10()+ scale_y_log10()+ scale_fill_distiller(palette="RdYlBu")+ coord_equal()+ labs(x="Drug A",y="Drug B",fill="Effect") ``` Note that `geom_braid_contour()` uses the smoothed and interpolated values like those produced by `geom_braid_smooth()` rather than the discrete values plotted by `geom_braid()` or `geom_braid_glass()`. There are two reasons for this: first, the underlying `Stat`, `stat_contour()` requires that the data plotted be laid out as a regular grid for it to perform its *own* interpolation. Second, contours linearly interpolated between more parsely sampled data are often disjointed and jagged, and carry much less information than more smoothly interpolated values. As with `geom_braid_smooth()`, `geom_braid_contour()` can be applied to both regular and irregular data, but with irregular data more care must be taken with the smoothing widths and heights: ```{r, warning=FALSE} ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid_smooth(aes(fill=measure))+ geom_point(colour="black")+ geom_braid_contour(aes(z=measure),breaks=10*(1:9),colour="black",linetype=2)+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal() ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid_smooth(aes(fill=measure,width=tilewidth,height=tileheight),space=2)+ geom_braid_contour(aes(z=measure,width=tilewidth,height=tileheight),space=2, breaks=10*(1:9),colour="black",linetype=2)+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal() ``` It should also be noted that while we have plotted contours and smoothed surfaces together here, this is only to highlight their connection to the underlying interpolated data. BRAID contours can be plotted all on their own which can be quite effective at comparing the results of different surfaces: ```{r, warning=FALSE} surface$type <- "Synergy" glassSurface$type <- "Antagonism" allSurface <- rbind(surface,glassSurface) allSurface$type <- factor(allSurface$type,c("Synergy","Antagonism")) ggplot(allSurface,aes(x=concA,y=concB,colour=type))+ geom_point()+ geom_braid_contour(aes(z=measure,width=tilewidth,height=tileheight), breaks=c(50,90), tight=TRUE)+ scale_x_log10()+ scale_y_log10()+ scale_color_brewer("Surface Type",palette="Set1")+ coord_equal()+ labs(x="Drug A",y="Drug B") ``` ## A Word About Warnings Astute `ggplot2` observers will note that at several points throughout this vignette, we have run standard `ggplot2` geoms such as `geom_point()` on logarithmically transformed data that result in infinite (transformed) values. Ordinarily, this would produce a warning: and in reality, it produces a warning here as well. We have suppressed warnings for this vignette, because while the `braid`, `braid_glass`, and `braid_smooth` geoms all explicitly handle such infinite transformed values, we have been unable to find a way to suppress these built in warnings. The warnings result, not when a stat is running, but *before* its functions are handled, and as such are, for the time being, unavoidable. When running a BRAID plotting function yourself, you will encounter these warnings as well: ```{r} # With warnings enabled... ggplot(surface,aes(x=concA,y=concB))+ geom_braid(aes(fill=measure))+ scale_x_log10()+ scale_y_log10()+ scale_fill_distiller(palette="RdYlBu")+ coord_equal()+ labs(x="Drug A",y="Drug B",fill="Effect") ``` While we feel that it is an unfortunate reality, we hope that this is a small enough inconvenience that the overall versatility and expressiveness of the BRAID plotting function outweighs it.