16  Annotate Observations

ggmark() can be used to add annotation plot for the selected observations. ggmark accepts mark argument, which should be a mark_draw() object to define how to draw the links.

Currently, two helper functions are provided to generate these links:

All of these functions specify links as pair_links() as introduced in Chapter 15. Each pair of links will introduce a panel in ggmark to annotate these observations.

Code
library(ggalign)
#> Loading required package: ggplot2
#> 
#> Attaching package: 'ggalign'
#> The following object is masked from 'package:ggplot2':
#> 
#>     element_polygon
set.seed(123)
small_mat <- matrix(rnorm(56), nrow = 7)
rownames(small_mat) <- paste0("row", seq_len(nrow(small_mat)))
colnames(small_mat) <- paste0("column", seq_len(ncol(small_mat)))

16.1 plot data

By default, if no observations are explicitly selected, ggmark() selects all observations and splits them based on the layout’s grouping.

Calling ggmark() initializes a ggplot object, the underlying data is created using fortify_data_frame(). Please refer to it for more details. In addition, the following columns will be added to the data frame:

  • .panel: the panel for the aligned axis. It means x-axis for vertical stack layout (including top and bottom annotation), y-axis for horizontal stack layout (including left and right annotation).
  • .names and .index: a character names (only applicable when names exists) and an integer of index of the original data.
  • .hand: A factor with levels c("left", "right") for horizontal stack layouts, or c("top", "bottom") for vertical stack layouts, indicating the position of the linked observations.
set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(mark_line()) +
    geom_boxplot(aes(.names, value)) +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(mark_tetragon()) +
    geom_boxplot(aes(.names, value)) +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

You can control the size or reach of the annotation links by modifying the plot margins using the theme(plot.margin) argument.

16.2 Selecting Observations

You can manually specify which observations to annotate. Note that the observations are based on data rows (not columns).

Note: Only data for the selected observations are retained

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(mark_line(1:3, c(3, 5:7))) +
    geom_boxplot(aes(.names, value)) +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(mark_tetragon(1:3, c(3, 5:7))) +
    geom_boxplot(aes(.names, value)) +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

You can specify the names (rownames) of the observations:

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(mark_line(paste0("row", 1:3), paste0("row", c(3, 5:7)))) +
    geom_boxplot(aes(.names, value)) +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(mark_tetragon(paste0("row", 1:3), paste0("row", c(3, 5:7)))) +
    geom_boxplot(aes(.names, value)) +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

16.3 Add Layout Grouping Information

you can use the group1 and group2 arguments to control whether the layout panel groups and their ordering should be added too.

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(mark_line(1:3), group1 = TRUE) +
    geom_boxplot(aes(.names, value)) +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(mark_tetragon(1:3), group1 = TRUE) +
    geom_boxplot(aes(.names, value)) +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

16.4 Facet

By default, ggmark() uses facet_wrap to define the facet, and you can use it to control the facet apearance (just ignore the facets argument). We prefer facet_wrap() here because it offers flexibility in positioning the strip on any side of the panel, and typically, we only want to a single dimension to create the annotate the selected observations. However, you can still use facet_grid() to create a two-dimensional plot. Note that for horizontal stack layouts, the row facets, or for vertical stack layouts, the column facets will always be overwritten.

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(mark_line()) +
    geom_boxplot(aes(.names, value, fill = .names)) +
    facet_wrap(vars(), scales = "free", strip.position = "right") +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

16.6 mark_draw()

Both mark_line() and mark_tetragon() are built on top of mark_draw() (strictly, .mark_draw()). This function allows you to define custom mark styles by supplying a drawing function which must return a grob/gList.

The function passed to mark_draw() must accept two arguments:

  • A data frame representing the panel-side coordinates (ggmark plot)

  • A data frame representing the observation-side coordinates

Each observation is assumed to occupy a unit length in the layout. Therefore, the observation-side coordinates include two terminal points x, y, xend, yend—representing the start and end of the observation along the linking axis.

Additional columns in the link data frame include:

  • link_id: The identifier of the link (e.g., following example 4:6 link has id “a”).

  • link_panel: Indicates which panel the link is drawn to, based on the layout.

  • link_index: The layout index for positioning.

  • .hand: Either “left”/“right” (horizontal) or “top”/“bottom” (vertical), specifying the hand of the observation.

  • .index: The original index of the observation.

Here is an example that prints the structure of the panel and link data frames:

set.seed(123)
p <- ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(
        mark_draw(function(panel, link) {
            print(panel)
            print(link)
        }, a = 4:6, 1:2)
    )
pdf(NULL)
print(p)
#> → heatmap built with `geom_tile()`
#>            x       xend          y      yend
#> 1 0.02720374 0.02720374 0.01235571 0.4938221
#>   x xend         y      yend link_id link_panel link_index .hand .index
#> 1 0    0 0.8606731 1.0000000       a          3          7  left      4
#> 2 0    0 0.1393269 0.2786539       a          1          2  left      5
#> 3 0    0 0.2786539 0.4179808       a          1          3  left      6
#>            x       xend         y      yend
#> 1 0.02720374 0.02720374 0.5061779 0.9876443
#>   x xend         y      yend link_id link_panel link_index .hand .index
#> 1 0    0 0.7213461 0.8606731       2          3          6  left      1
#> 2 0    0 0.5696635 0.7089904       2          2          5  left      2
invisible(dev.off())

Here, we draw a triangle to connect the

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(
        mark_draw(function(panel, link) {
            x <- c((panel$x + panel$xend) / 2L, link$x, link$xend)
            y <- c((panel$y + panel$yend) / 2L, link$y, link$yend)
            grid::polygonGrob(x, y)
        }, 4, 2) # selecting one observation from each group for simple example
    ) +
    geom_boxplot(aes(.names, value, fill = .names)) +
    facet_wrap(vars(), scales = "free", strip.position = "right") +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`