Reorder or Group observations based on hierarchical clustering
Source:R/align-dendrogram.R
align_dendro.Rd
This function aligns observations within the layout according to a hierarchical clustering tree, enabling reordering or grouping of elements based on clustering results.
Usage
align_dendro(
mapping = aes(),
...,
distance = "euclidean",
method = "complete",
use_missing = "pairwise.complete.obs",
reorder_dendrogram = FALSE,
merge_dendrogram = FALSE,
reorder_group = FALSE,
k = NULL,
h = NULL,
cutree = NULL,
plot_dendrogram = TRUE,
plot_cut_height = NULL,
root = NULL,
center = FALSE,
type = "rectangle",
size = NULL,
data = NULL,
no_axes = NULL,
active = NULL,
free_guides = deprecated(),
free_spaces = deprecated(),
plot_data = deprecated(),
theme = deprecated(),
free_labs = deprecated(),
set_context = deprecated(),
order = deprecated(),
name = deprecated()
)
Arguments
- mapping
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
- ...
<dyn-dots> Additional arguments passed to
geom_segment()
.- distance
A string of distance measure to be used. This must be one of
"euclidean"
,"maximum"
,"manhattan"
,"canberra"
,"binary"
or"minkowski"
. Correlation coefficient can be also used, including"pearson"
,"spearman"
or"kendall"
. In this way,1 - cor
will be used as the distance. In addition, you can also provide adist
object directly or a function return adist
object. UseNULL
, if you don't want to calculate the distance.- method
A string of the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of
"ward.D"
,"ward.D2"
,"single"
,"complete"
,"average"
(= UPGMA),"mcquitty"
(= WPGMA),"median"
(= WPGMC) or"centroid"
(= UPGMC). You can also provide a function which accepts the calculated distance (or the input matrix ifdistance
isNULL
) and returns ahclust
object. Alternative, you can supply an object which can be coerced tohclust
.- use_missing
An optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings
"everything"
,"all.obs"
,"complete.obs"
,"na.or.complete"
, or"pairwise.complete.obs"
. Only used whendistance
is a correlation coefficient string.- reorder_dendrogram
A single boolean value indicating whether to reorder the dendrogram based on the means. Alternatively, you can provide a custom function that accepts an
hclust
object and the data used to generate the tree, returning either anhclust
ordendrogram
object. Default isFALSE
.- merge_dendrogram
A single boolean value, indicates whether we should merge multiple dendrograms, only used when previous groups have been established. Default:
FALSE
.- reorder_group
A single boolean value, indicates whether we should do Hierarchical Clustering between groups, only used when previous groups have been established. Default:
FALSE
.- k
An integer scalar indicates the desired number of groups.
- h
A numeric scalar indicates heights where the tree should be cut.
- cutree
A function used to cut the
hclust
tree. It should accept four arguments: thehclust
tree object,distance
(only applicable whenmethod
is a string or a function for performing hierarchical clustering), k (the number of clusters), and h (the height at which to cut the tree). By default,cutree()
is used.- plot_dendrogram
A boolean value indicates whether plot the dendrogram tree.
- plot_cut_height
A boolean value indicates whether plot the cut height.
- root
A length one string or numeric indicates the root branch.
- center
A boolean value. if
TRUE
, nodes are plotted centered with respect to the leaves in the branch. Otherwise (default), plot them in the middle of all direct child nodes.- type
A string indicates the plot type,
"rectangle"
or"triangle"
.- size
The relative size of the plot, can be specified as a
unit
.- data
A matrix-like object. By default, it inherits from the layout
matrix
.- no_axes
Logical; if
TRUE
, removes axes elements for the alignment axis usingtheme_no_axes()
. By default, will controled by the option-"ggalign.align_no_axes"
.- active
A
active()
object that defines the context settings when added to a layout.- free_guides
Please use
plot_align()
function instead.- free_spaces
Please use
plot_align()
function instead.- plot_data
Please use
plot_data()
function instead.- theme
Please use
plot_theme()
function instead.- free_labs
Please use
plot_align()
function instead.- set_context
- order
- name
ggplot2 specification
align_dendro
initializes a ggplot data
and mapping
.
The internal will always use a default mapping of aes(x = .data$x, y = .data$y)
.
The default ggplot data is the node
coordinates with edge
data attached
in ggalign
attribute, in addition, a
geom_segment
layer with a data of the edge
coordinates will be added.
node
and tree segments edge
coordinates contains following columns:
index
: the original index in the tree for the current nodelabel
: node label textx
andy
: x-axis and y-axis coordinates for current node or the start node of the current edge.xend
andyend
: the x-axis and y-axis coordinates of the terminal node for current edge.branch
: which branch current node or edge is. You can use this column to color different groups.panel
: which panel current node is, if we split the plot into panel usingfacet_grid
, this column will show which panel current node or edge is from. Note: some nodes may fall outside panel (between two panel), so there are possibleNA
values in this column..panel
: Similar withpanel
column, but always give the correct branch for usage of the ggplot facet.panel1
andpanel2
: The panel1 and panel2 variables have the same functionality aspanel
, but they are specifically for theedge
data and correspond to both nodes of each edge.leaf
: A logical value indicates whether current node is a leaf.
Axis Alignment for Observations
It is important to note that we consider rows as observations, meaning
vec_size(data)
/NROW(data)
must match the number of observations along the
axis used for alignment (x-axis for a vertical stack layout, y-axis for a
horizontal stack layout).
quad_layout()
/ggheatmap()
: For column annotation, the layoutmatrix
will be transposed before use (ifdata
is a function, it is applied to the transposed matrix), as column annotation uses columns as observations but alignment requires rows.stack_layout()
: The layout matrix is used as is, aligning all plots along a single axis.