Computes the density or count of genomic regions in sliding or fixed windows across the genome. The density can be reported as the percentage of uncovered bases or the number of overlapping regions within each window.
Usage
genomic_density(
region,
window_size = 1e+07,
n_window = NULL,
overlap = TRUE,
mode = c("coverage", "count"),
seqlengths = NULL
)
Arguments
- region
A data frame with at least 3 columns: chromosome, start, and end.
Column 1: character or factor, chromosome name.
Column 2: numeric, start position (must be <= end).
Column 3: numeric, end position.
- window_size
Numeric, the width of each window (default is
1e+07
). Ignored ifn_window
is specified.- n_window
Integer, the number of windows per chromosome. If provided, overrides
window_size
and evenly splits the chromosome inton_window
(non-overlapping) or2*n_window - 1
(overlapping) windows.- overlap
Logical, whether to use overlapping windows (default
TRUE
). Overlapping windows are spaced by half the window size.- mode
Character, either
"coverage"
or"count"
:"count"
: reports the number of regions overlapping each window."coverage"
: reports the fraction of each window covered by regions.
- seqlengths
Optional named vector of chromosome lengths. If missing, the maximum
end
value in the input is used as the chromosome length.
Value
A data frame with columns:
seqnames
: The sequence (e.g., chromosome) names.start
: start of each windowend
: end of each windowdensity
: the region count or covered percent, depending onmode
Details
This function splits the input by chromosome and tiles the genomic space into windows, optionally overlapping. For each window, it calculates:
the number of regions that overlap it (if
mode = "count"
), orthe fraction of bases covered by any region (if
mode = "percent"
).
Examples
region <- data.frame(
chr = rep("chr1", 3),
start = c(100, 5000000, 15000000),
end = c(2000000, 7000000, 17000000)
)
genomic_density(region, window_size = 1e7, mode = "count")
#> seqnames start end density
#> 1 chr1 1 10000000 2
#> 2 chr1 5000001 15000000 2
genomic_density(region, n_window = 3, overlap = FALSE, mode = "coverage")
#> seqnames start end density
#> 1 chr1 1 5666667 0.4705710
#> 2 chr1 5666667 11333334 0.2352942
#> 3 chr1 11333334 17000001 0.3529413