6  Annotate observations

Code
library(ggalign)
#> Loading required package: ggplot2
set.seed(123)
small_mat <- matrix(rnorm(56), nrow = 7)
rownames(small_mat) <- paste0("row", seq_len(nrow(small_mat)))
colnames(small_mat) <- paste0("column", seq_len(ncol(small_mat)))

To add an annotation plot for specific observations, we must first know how to select the observations to be linked.

6.2 ggmark()

ggmark() can be used to add annotation plot for the selected observations. ggmark accepts mark argument, which should be a mark_draw() object to define how to draw the links. Currently, two internal functions mark_line(), mark_tetragon() can be used to quickly draw line and quadrilateral links used to connect the selected observations and the plot panel.

By default, when no manual observations were selected, ggmark() will select all observations and split them based on the groups defined in the layout.

The data underlying the ggplot object generated by ggmark() is similar to that of ggalign() (Section 5.1), but it differs in that it does not include the .x, .y, and .discrete_x/.discrete_y columns. Instead, a special column named .hand is added, which is a factor with levels c("left", "right") for horizontal stack layouts or c("top", "bottom") for vertical stack layouts. This column indicates the position of the linked observations.

Note: Only data for selected observations are retained.

You can adjust the link size by using the plot.margin argument.

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(mark_line()) +
    geom_boxplot(aes(.names, value)) +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(mark_tetragon()) +
    geom_boxplot(aes(.names, value)) +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

If you manually provide the linked observations, you can use the group1 and group2 arguments to control whether the layout panel groups and their ordering should be used to create the annotations.

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(mark_tetragon(1:3), group1 = TRUE) +
    geom_boxplot(aes(.names, value)) +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

By default, ggmark() uses facet_wrap to define the facet, and you can use it to control the facet apearance (just ignore the facets argument). We prefer facet_wrap() here because it offers flexibility in positioning the strip on any side of the panel, and typically, we only want to a single dimension to create the annotate the selected observations. However, you can still use facet_grid() to create a two-dimensional plot. Note that for horizontal stack layouts, the row facets, or for vertical stack layouts, the column facets will always be overwritten.

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(mark_line()) +
    geom_boxplot(aes(.names, value, fill = .names)) +
    facet_wrap(vars(), scales = "free", strip.position = "right") +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

You can further customize the appearance of link lines and quadrilaterals using the element argument in mark_line()/mark_tetragon() function:

  • Link lines can be customized using the element_line().
  • Link ranges can be customized using the element_polygon().

By default, vectorized fields in element_line() and element_polygon() will be recycled to match the total number of groups.

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(
        mark_line(4:6, 1:2, 
            .element =  element_line(color = c("red", "blue"))
        )
    ) +
    geom_boxplot(aes(.names, value, fill = .names)) +
    facet_wrap(vars(), scales = "free", strip.position = "right") +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(
        mark_tetragon(4:6, 1:2, 
            .element =  element_polygon(fill = c("red", "blue"), alpha = 0.5)
        )
    ) +
    geom_boxplot(aes(.names, value, fill = .names)) +
    facet_wrap(vars(), scales = "free", strip.position = "right") +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

You can wrap the element with I() to recycle it to match the drawing groups. The drawing groups typically correspond to the number of observations for element_line(), as each observation will be linked with the plot panel.

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(
        mark_line(4:6, 1:2, 
            .element =  I(element_line(color = c("red", "blue")))
        )
    ) +
    geom_boxplot(aes(.names, value, fill = .names)) +
    facet_wrap(vars(), scales = "free", strip.position = "right") +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

For element_polygon(), the drawing groups usually align with the defined groups. However, if the defined group of observations is separated and cannot be linked with a single quadrilateral, the number of drawing groups will be larger than the number of defined groups.

set.seed(123)
ggheatmap(small_mat) +
    theme(axis.text.x = element_text(hjust = 0, angle = -60)) +
    anno_right() +
    align_kmeans(3L) +
    ggmark(
        mark_tetragon(4:6, 1:2, 
            .element =  I(element_polygon(fill = c("red", "blue"), alpha = 0.5))
        )
    ) +
    geom_boxplot(aes(.names, value, fill = .names)) +
    facet_wrap(vars(), scales = "free", strip.position = "right") +
    theme(plot.margin = margin(l = 0.1, t = 0.1, unit = "npc"))
#> → heatmap built with `geom_tile()`

For stack_layout(), we usually don’t need to specify the observations for hand2, since it should match hand1. This is because all plots in stack_discrete() should maintain the same ordering index. However, specifying hand2 becomes useful in stack_cross(), where different orderings are involved.

stack_discreteh(small_mat) +
    align_dendro(aes(color = branch), k = 3L) +
    scale_x_reverse(expand = expansion()) +
    theme(plot.margin = margin()) +
    ggmark(mark_line(4:6 ~ waiver(), 1:2 ~ waiver())) +
    geom_boxplot(aes(.names, value, fill = .names)) +
    theme(plot.margin = margin(l = 0.1, t = 0.1, r = 0.1, b = 0.1, unit = "npc")) +
    align_dendro(aes(color = branch), k = 3L) +
    scale_x_continuous(expand = expansion()) +
    theme(plot.margin = margin())

Now, let’s move on to the next chapter, where we will introduce quad_layout() in full. While we’ve already introduced ggheatmap()—a specialized version of quad_layout()—most of the operations discussed in Chapter 3 can also be applied to quad_layout(). In the next section, we’ll delve into quad_layout() and explore its full functionality.