12  ggoncoplot

The ggoncoplot() function generates oncoPrint visualizations that display genetic alterations in a matrix format. This function is especially useful for visualizing complex genomic data, such as mutations, copy number variations, and other genomic alterations in cancer research.

Code
library(ggalign)
#> Loading required package: ggplot2

12.1 Input data

The input should be a character matrix which encodes the alterations, you can use string of ";", ":", ",", and "|" to separate multiple alterations. Internally, ggoncoplot() will use fortify_matrix() to get such matrix.

mat <- read.table(
    textConnection(
        "s1,s2,s3
         g1,snv;indel,snv,indel
         g2,,snv;indel,snv
         g3,snv,,indel;snv"
    ),
    row.names = 1, header = TRUE, sep = ",", stringsAsFactors = FALSE
)
mat
#>                    s1        s2        s3
#>          g1 snv;indel       snv     indel
#>          g2           snv;indel       snv
#>          g3       snv           indel;snv

A basic oncoprint can be generated as follows:

ggoncoplot(mat)

12.2 oncoPrint Customization

By default, all alterations are represented with tiles of equal width and height, which may lead to overlapping. You can control the width and height of the tiles using the map_width and map_height arguments (we will introduce another more effective ways to handle this in the Specialized Geoms section):

ggoncoplot(mat, map_width = c(snv = 0.5), map_height = c(indel = 0.9))

By default, all empty string will be converted to NA value and ggplot2 will translate the NA values and render it in the legend. To prevent this, you can turn off the translation using na.translate = FALSE:

ggoncoplot(mat, map_width = c(snv = 0.5), map_height = c(indel = 0.9)) +
    scale_fill_brewer(palette = "Dark2", na.translate = FALSE)

The function automatically reorders rows and columns based on alteration frequency. Disable this with reorder_row = FALSE and reorder_column = FALSE.

You can further customize layouts using methods from vignette("layout-customize").

ggoncoplot(mat,
    map_width = c(snv = 0.5), map_height = c(indel = 0.9),
    reorder_row = FALSE, reorder_column = FALSE
) +
    scale_fill_brewer(palette = "Dark2", na.translate = FALSE)

12.3 Advanced Data Handling

ggoncoplot() is built on ggheatmap() with a default scheme_data() (see vignette("plot-options") for details), which splits alterations into separate entries for visualization. (See ggplot2 specification section in ggheatmap() for the input data of scheme_data()):

pdata <- function(data) {
    tidyr::separate_longer_delim(data,
        cols = value,
        delim = stringr::regex("\\s*[;:,|]\\s*")
    )
}
# Note: this figure will contain an empty string value.
#
# `ggoncoplot()` will automatically convert any empty strings to `NA`
# before pass it to `ggheatmap`.
ggheatmap(mat) -
    scheme_data(pdata) +
    scale_fill_brewer(palette = "Dark2", na.translate = FALSE)
#> → heatmap built with `geom_tile()`

By default, the scheme_data() is inherited from the parent layout if the data of the plot is inherited from the layout. You can apply the parent scheme_data() first and then apply another transformation by setting inherit = TRUE. This functionality is especially useful when working with ggoncoplot().

ggoncoplot(mat, map_width = c(snv = 0.5), map_height = c(indel = 0.9)) +
    anno_top(size = 0.2) +
    ggalign() +
    # by setting `inherti = TRUE`, we apply the parent layout `scheme_data()`
    # (from the `ggoncoplot()` layout) firstly, which will split the alteration
    # string and unnested the columns.
    # Here: We then remove `NA` value
    scheme_data(~ subset(.x, !is.na(value)), inherit = TRUE) +
    geom_bar(aes(.x, after_stat(count), fill = value)) +
    # note: `ggoncoplot()` use `geom_tile()` to draw the oncoPrint,
    # the guide is different from `geom_bar()`, though both looks
    # like the same, we remove the guide of `geom_bar()` plot
    guides(fill = "none") &
    scale_fill_brewer(palette = "Dark2", na.translate = FALSE)

12.4 Integration with maftools

The ggoncoplot() function supports MAF objects from the maftools package using the fortify_matrix.MAF() method. It includes options to control data generation for ggoncoplot(), such as drawing the top n_top genes.

# load data from `maftools`
laml.maf <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools")
# clinical information containing survival information and histology. This is optional
laml.clin <- system.file("extdata", "tcga_laml_annot.tsv", package = "maftools")
laml <- maftools::read.maf(
    maf = laml.maf,
    clinicalData = laml.clin,
    verbose = FALSE
)
# Visualizing the Top 20 Genes
ggoncoplot(laml, n_top = 20) +
    scale_fill_brewer(palette = "Dark2", na.translate = FALSE) +
    theme_no_axes("x")

By default, samples without alterations in the selected genes are removed. To include all samples, set remove_empty_samples = FALSE:

align_plots(
    ggoncoplot(laml, n_top = 20L, remove_empty_samples = FALSE) +
        scale_fill_brewer(palette = "Dark2", na.translate = FALSE) +
        theme_no_axes("x") +
        ggtitle("Keep empty samples"),
    ggoncoplot(laml, n_top = 20L, remove_empty_samples = TRUE) +
        scale_fill_brewer(palette = "Dark2", na.translate = FALSE) +
        theme_no_axes("x") +
        ggtitle("Remove empty samples"),
    ncol = 1L,
    guides = "tlbr"
)

12.5 Specialized Geoms

12.5.1 geom_subtile()

When multiple alterations occur in the same sample and gene, they are combined into a single value, "Multi_Hit", by default. To visualize these alterations separately, you can set collapse_vars = FALSE. However, doing so can lead to overlapping alterations within the same cell, making the visualization cluttered and hard to interpret.

In such cases, disabling the default filling and defining a custom heatmap layer with geom_subtile() is more effective. This function subdivides each cell into smaller rectangles, allowing the distinct alterations to be clearly displayed.

ggoncoplot(laml, n_top = 20, collapse_vars = FALSE, filling = FALSE) +
    geom_subtile(aes(fill = value), direction = "vertical") +
    scale_fill_brewer(palette = "Dark2", na.translate = FALSE) +
    theme_no_axes("x")
#> `geom_subtile()` subdivide tile into a maximal of 3 rectangles

We focus exclusively on cells with multiple alterations to highlight the differences, by filtering the data before applying geom_subtile():

ggoncoplot(laml, n_top = 20, collapse_vars = FALSE, filling = FALSE) +
    scheme_data(function(data) {
        dplyr::filter(data, dplyr::n() > 1L, .by = c(".x", ".y"))
        # we apply the parent layout `scheme_data()` first (`inherit = TRUE`),
        # which will split the alteration string and unnested the columns.
    }, inherit = TRUE) +
    geom_subtile(aes(fill = value), direction = "vertical") +
    scale_fill_brewer(palette = "Dark2", na.translate = FALSE) +
    theme_no_axes("x")
#> `geom_subtile()` subdivide tile into a maximal of 3 rectangles

12.5.2 geom_draw()

geom_subtile() often suffices for most scenarios. However, if you require a strategy similar to that of ComplexHeatmap, consider using geom_draw(), which offers greater flexibility for complex customizations.

Key Features of geom_draw():

  • Custom Drawing Logic: Accepts a draw aesthetic, allowing each cell to be rendered as a specific graphical object (grob).
  • Manual Scaling: Requires scale_draw_manual() to map draw values to corresponding drawing functions.
  • Flexible Aesthetic Mapping: Functions mapped through scale_draw_manual() can utilize any number of ggplot2 aesthetics and output custom graphical objects (grob). Beyond the x, y, xmin, xmax, width, and height aesthetics, you may want to rely solely on mapped aesthetics (aes()). Unmapped aesthetics will default to ggplot2’s behavior, which might not produce the desired outcome. And always use native unit.

:) Sorry, I cannot deal with a pretty example for this, but the function provided in values argument of scale_draw_manual operates similarly to alter_fun in ComplexHeatmap. Currently, geom_subtile() performs well enough for most cases, making geom_draw() somewhat cumbersome to use. Managing alteration types, especially in the case of overlapping alterations, requires meticulous design to ensure that the visual elements do not interfere with each other.

library(grid)
draw_mapping <- list(
    function(x, y, width, height, fill) {
        rectGrob(x, y,
            width = width, height = height,
            gp = gpar(fill = fill),
            default.units = "native"
        )
    },
    function(x, y, width, height, fill) {
        rectGrob(x, y,
            width = width, height = height,
            gp = gpar(fill = fill),
            default.units = "native"
        )
    },
    function(x, y, width, height, fill) {
        rectGrob(x, y,
            width = width, height = height,
            gp = gpar(fill = fill),
            default.units = "native"
        )
    },
    function(x, y, width, height, shape) {
        gList(
            pointsGrob(x, y, pch = shape),
            # To ensure the rectangle color is shown in the legends, you
            # must explicitly provide a color argument and include it in
            # the `gpar()` of the graphical object
            rectGrob(x, y, width, height,
                gp = gpar(col = "black", fill = NA)
            )
        )
    },
    function(xmin, xmax, ymin, ymax) {
        segmentsGrob(
            xmin, ymin,
            xmax, ymax,
            gp = gpar(lwd = 2)
        )
    }
)
value <- sample(letters, 5L)
ggplot(data.frame(value = value, y = seq_len(5))) +
    geom_draw(aes(x = 1, y = y, draw = value, fill = value)) +
    scale_draw_manual(values = draw_mapping) +
    scale_fill_brewer(palette = "Dark2")
ggplot(data.frame(value = c(value, value[5L]), y = c(1, 2, 3, 1, 2, 3))) +
    geom_draw(aes(x = 1, y = y, draw = value, fill = value)) +
    scale_draw_manual(values = draw_mapping) +
    scale_fill_brewer(palette = "Dark2")

12.6 ggalign attributes

Except for the data used for the main plot, fortify_matrix.MAF() also attaches several useful attributes.

  • gene_summary: gene summary informations. See maftools::getGeneSummary() for details.
  • sample_summary: sample summary informations. See maftools::getSampleSummary() for details.
  • sample_anno: sample clinical informations. See maftools::getClinicalData() for details.
  • n_genes: Total of genes.
  • n_samples: Total of samples.
  • titv: A list of data.frames with Transitions and Transversions summary. See maftools::titv() for details.

You can extract these attributes using the ggalign_attr() function. Below is an example of how to extract the sample summary data (e.g., TMB) using ggalign_attr() and add it to the top annotation of an oncoplot:

ggoncoplot(laml, n_top = 20, collapse_vars = FALSE, filling = FALSE) +
    geom_subtile(aes(fill = value), direction = "vertical") +
    theme_no_axes("x") +
    anno_top(size = 0.2) +
    ggalign(data = function(data) {
        data <- ggalign_attr(data, "sample_summary")
        # matrix input will be automatically melted into a long foramted data
        # frame in `ggalign()` function.
        as.matrix(data[2:(ncol(data) - 1L)])
    }) +
    geom_bar(aes(.x, value, fill = .column_names),
        stat = "identity"
    ) +
    ylab("TMB") &
    scale_fill_brewer(palette = "Dark2", na.translate = FALSE)
#> `geom_subtile()` subdivide tile into a maximal of 3 rectangles

We can draw the example from maftools vignette.

ggoncoplot(laml, n_top = 20, collapse_vars = FALSE, filling = FALSE) +
    geom_subtile(aes(fill = value), direction = "vertical") +
    theme_no_axes("x") +
    # since legends from geom_tile (oncoPrint body) and `geom_bar`
    # is different, though both looks like the same, the internal
    # won't merge the legends. we remove the legends of oncoPrint body
    guides(fill = "none") +
    # add top annotation
    anno_top(size = 0.2) +
    ggalign(data = function(data) {
        data <- ggalign_attr(data, "sample_summary")
        # matrix input will be automatically melted into a long foramted data
        # frame in `ggalign()` function.
        as.matrix(data[2:(ncol(data) - 1L)])
    }) +
    geom_bar(aes(.x, value, fill = .column_names),
        stat = "identity"
    ) +
    ylab("TMB") +
    # add right annotation
    anno_right(size = 0.2) -
    # remove bottom spaces of the right annotation when aligning
    scheme_align(free_spaces = "b") +
    # add the text percent for the alterated samples in the right annotation
    ggalign(data = function(data) {
        # Atomic vector will be put in the `value` column of the data frame.
        ggalign_attr(data, "gene_summary")$AlteredSamples /
            ggalign_attr(data, "n_samples")
    }) +
    geom_text(aes(1, label = scales::label_percent()(value)), hjust = 1) +
    scale_x_continuous(
        expand = expansion(),
        name = NULL, breaks = NULL,
        limits = c(0, 1)
    ) +
    theme(plot.margin = margin()) +
    # add the bar plot in the right annotation
    ggalign(data = function(data) {
        data <- ggalign_attr(data, "gene_summary")
        # matrix input will be automatically melted into a long foramted data
        # frame in `ggalign()` function.
        as.matrix(data[2:8])
    }) +
    geom_bar(aes(value, fill = .column_names),
        stat = "identity",
        orientation = "y"
    ) +
    xlab("No. of samples") -
    # we apply the scale mapping to the top and right annotation: `position = "tr"`
    # and the main plot: `main = TRUE`
    with_quad(
        scale_fill_brewer("Mutations",
            palette = "Dark2", na.translate = FALSE
        ),
        position = "tr",
        main = TRUE
    ) +
    # add bottom annotation
    anno_bottom(size = 0.2) +
    # add bar plot in the bottom annotation
    ggalign(data = function(data) {
        data <- ggalign_attr(data, "titv")$fraction.contribution
        # matrix input will be automatically melted into a long foramted data
        # frame in `ggalign()` function.
        as.matrix(data[2:7])
    }) +
    geom_bar(aes(y = value, fill = .column_names), stat = "identity") +
    ylab("Ti/Tv") +
    scale_fill_brewer("Ti/Tv", palette = "Set2")
#> `geom_subtile()` subdivide tile into a maximal of 3 rectangles
#> Warning: Removed 24 rows containing missing values or values outside the scale range
#> (`geom_bar()`).

12.7 Integration with GISTIC results from maftools

The package also includes a fortify_matrix.GISTIC() method designed to handle GISTIC objects from the maftools package. This allows you to seamlessly apply the same operations to visualize GISTIC results. The following ggalign attributes are generated as part of the analysis:

  • sample_anno: sample clinical informations provided in sample_anno argument.
  • sample_summary: sample copy number summary informations. See data@cnv.summary for details.
  • cytoband_summary: cytoband summary informations. See data@cytoband.summary for details.
  • gene_summary: gene summary informations. See data@gene.summary for details.
  • summary: A data frame of summary information. See data@summary for details.
# Ensure the maftools package is installed and load the example GISTIC data
all.lesions <- system.file("extdata", "all_lesions.conf_99.txt", package = "maftools")
amp.genes <- system.file("extdata", "amp_genes.conf_99.txt", package = "maftools")
del.genes <- system.file("extdata", "del_genes.conf_99.txt", package = "maftools")
scores.gistic <- system.file("extdata", "scores.gistic", package = "maftools")
laml.gistic <- maftools::readGistic(
    gisticAllLesionsFile = all.lesions, gisticAmpGenesFile = amp.genes,
    gisticDelGenesFile = del.genes, gisticScoresFile = scores.gistic
)
#> -Processing Gistic files..
#> --Processing amp_genes.conf_99.txt
#> --Processing del_genes.conf_99.txt
#> --Processing scores.gistic
#> --Summarizing by samples
ggoncoplot(laml.gistic) +
    scale_fill_brewer("CNV", palette = "Dark2", na.translate = FALSE) +
    theme_no_axes("x")