Skip to contents

Reorder or Group layout based on hierarchical clustering

Usage

align_dendro(
  mapping = aes(),
  ...,
  distance = "euclidean",
  method = "complete",
  use_missing = "pairwise.complete.obs",
  reorder_group = FALSE,
  k = NULL,
  h = NULL,
  plot_dendrogram = TRUE,
  plot_cut_height = NULL,
  root = NULL,
  center = FALSE,
  type = "rectangle",
  size = NULL,
  data = NULL,
  free_labs = waiver(),
  free_spaces = waiver(),
  plot_data = waiver(),
  set_context = TRUE,
  order = NULL,
  name = NULL
)

Arguments

mapping

Additional default list of aesthetic mappings to use for plot.

...

Additional arguments passed to geom_segment().

distance

A string of distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". Correlation coefficient can be also used, including "pearson", "spearman" or "kendall". In this way, 1 - cor will be used as the distance. In addition, you can also provide a dist object directly or a function return a dist object.

method

A string of the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). you can also provide a function which returns a hclust object.

use_missing

An optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". Only used when distance is a correlation coefficient string.

reorder_group

A single boolean value, indicates whether we should do Hierarchical Clustering between groups, only used when previous groups have been established.

k

An integer scalar indicates the desired number of groups.

h

A numeric scalar indicates heights where the tree should be cut.

plot_dendrogram

A boolean value indicates whether plot the dendrogram tree.

plot_cut_height

A boolean value indicates whether plot the cut height.

root

A length one string or numeric indicates the root branch.

center

A boolean value. if TRUE, nodes are plotted centered with respect to the leaves in the branch. Otherwise (default), plot them in the middle of all direct child nodes.

type

A string indicates the plot type, "rectangle" or "triangle".

size

Plot size, can be an unit object.

data

A matrix, a data frame, or even a simple vector that will be converted into a one-column matrix. If the data argument is set to NULL, the align_* will use the layout data. Additionally, the data argument can also accept a function (purrr-like lambda is also okay), which will be applied with the layout data,

It is important to note that all align_* functions consider the rows as the observations. It means the NROW(data) must return the same number with the parallel layout axis.

  • layout_heatmap: for column annotation, the layout data will be transposed before using (If data is a function, it will be applied with the transposed matrix). This is necessary because column annotation uses heatmap columns as observations, but we need rows.

  • layout_stack: the layout data will be used as it is since we place all plots along a single axis.

free_labs

A boolean value or a string containing one or more of "t", "l", "b", and "r" indicates which axis title should be free from alignment. If NULL, all axis title will be aligned. Default: "tlbr".

free_spaces

A boolean value or a string containing one or more of "t", "l", "b", and "r" indicates which border spaces should be removed. If NULL (default), no space will be removed.

plot_data

A function used to transform the plot data before rendering. By default, it'll inherit from the parent layout. If no parent layout, the default is NULL, which means we won't want to modify anything.

Used to modify the data after layout has been created, but before the data is handled of to the ggplot2 for rendering. Use this hook if the you needs change the default data for all geoms.

set_context

A single boolean value indicates whether to set the active context to current plot. If TRUE, all subsequent ggplot elements will be added into this plot.

order

An single integer for the layout order.

name

A string of the plot name. Used to switch the active context in hmanno() or stack_active().

Value

A new Align object.

ggplot2 specification

align_dendro initializes a ggplot data and mapping.

The internal will always use a default mapping of aes(x = .data$x, y = .data$y).

The default ggplot data is the node coordinates, in addition, a geom_segment layer with a data of the tree segments edge coordinates will be added.

node and tree segments edge coordinates contains following columns:

  • index: the original index in the tree for the current node

  • label: node label text

  • x and y: x-axis and y-axis coordinates for current node or the start node of the current edge.

  • xend and yend: the x-axis and y-axis coordinates of the terminal node for current edge.

  • branch: which branch current node or edge is. You can use this column to color different groups.

  • panel: which panel current node is, if we split the plot into panel using facet_grid, this column will show which panel current node or edge is from. Note: some nodes may fall outside panel (between two panel), so there are possible NA values in this column. We also provide .panel column, which always give the right branch for usage of the ggplot facet.

  • .panel: See panel, this is what we often used.

  • panel1 and panel2: The panel1 and panel2 variables have the same functionality as panel, but they are specifically for the edge data and correspond to both nodes of each edge.

  • leaf: A logical value indicates whether current node is a leaf.

Examples

ggheatmap(matrix(rnorm(81), nrow = 9)) +
    hmanno("top") +
    align_dendro()

ggheatmap(matrix(rnorm(81), nrow = 9)) +
    hmanno("top") +
    align_dendro(k = 3L)