Calculates the decay of linkage disequilibrium (r\(^2\) or rV\(^2\))
with physical distance for each chromosome, fits optional decay models
(Hill-Weir nonlinear or LOESS), and determines the distance at which LD
drops below a user-specified or data-driven critical threshold. The output
can be passed directly to define_qtl_regions() to define
chromosome-specific candidate gene windows around GWAS hits.
Usage
compute_ld_decay(
geno,
snp_info = NULL,
method = c("r2", "rV2"),
kin_method = c("chol", "eigen"),
sampling = c("random", "sliding_window", "both"),
window_snps = 50L,
n_pairs = 50000L,
max_dist = 5000000L,
r2_threshold = "both",
n_unlinked = 100000L,
pctile = 95,
fit_model = c("loess", "nonlinear", "both", "none"),
n_bins = 100L,
by_chr = TRUE,
chr = NULL,
seed = 42L,
n_threads = 1L,
verbose = TRUE
)Arguments
- geno
LDxBlocks_backend(fromread_genoorread_geno_bigmemory), a numeric dosage matrix (individuals x SNPs), or a file path accepted byread_geno.- snp_info
Data frame with columns
SNP,CHR,POS. Auto-extracted from the backend whengenois a backend object.- method
LD metric:
"r2"(default) or"rV2"(kinship-adjusted; requires AGHmatrix and ASRgenomics).- kin_method
Whitening method for
rV2:"chol"(default) or"eigen".- sampling
Pair-sampling strategy:
"random"(default, WGS-safe),"sliding_window"(TASSEL approach), or"both".- window_snps
Integer. Window size in SNPs for
"sliding_window". Default50L.- n_pairs
Integer. Maximum SNP pairs per chromosome for
"random"sampling. Default50000L.- max_dist
Integer. Maximum physical distance (bp) between a pair. Default
5000000L(5 Mb).- r2_threshold
Threshold for decay distance: a numeric value (e.g.
0.1),"parametric"(95th percentile of unlinked r\(^2\)),"both", orNULL. Default"both".- n_unlinked
Integer. Cross-chromosome pairs for parametric threshold. Default
100000L.- pctile
Numeric (0-100). Percentile of unlinked r\(^2\) distribution for the parametric threshold. Default
95.- fit_model
Decay model:
"loess"(default),"nonlinear"(Hill-Weir),"both", or"none".- n_bins
Integer. Distance bins for the smoothed decay curve. Default
100L.- by_chr
Logical. Return per-chromosome decay distances. Default
TRUE.- chr
Character vector of chromosomes to process.
NULL= all.- seed
Integer. RNG seed. Default
42L.- n_threads
Integer. OpenMP threads for C++ r\(^2\) kernel. Default
1L.- verbose
Logical. Print timestamped progress. Default
TRUE.
Value
A named list of class LDxBlocks_decay:
pairsData frame:
CHR,dist_bp,r2.decay_curveBinned decay curve per chromosome:
CHR,dist_bp,r2_mean,r2_median,n_pairs, and optionallyr2_loess,r2_hw.decay_distPer-chromosome decay distance:
CHR,decay_dist_bp,decay_dist_kb,threshold_used,r2_col_used,censored(TRUEwhen the smoothed curve never drops below the threshold withinmax_dist– the returned distance is a lower bound, not an estimate; extendmax_distor lower the threshold).decay_dist_genomeGenome-wide median decay distance (bp).
critical_r2Active threshold value.
critical_r2_fixedAlways set to 0.1 as a reference value (standard threshold used in most GWAS papers). This field is populated regardless of which
r2_thresholdwas requested. It is the active threshold only whenr2_threshold = 0.1orr2_threshold = "both"with no parametric result available.critical_r2_paramParametric threshold (
NULLif not computed).unlinked_r2Cross-chromosome r\(^2\) values used for the parametric threshold (
NULLif not computed).model_paramsHill-Weir C parameter per chromosome (
NULLif not fitted).n_pairs_usedNamed integer: pairs computed per chromosome.
methodLD metric used.
samplingSampling strategy used.
callMatched function call.
Details
C++ acceleration: All pairwise r\(^2\) computations use the package's compiled C++ kernels (Armadillo BLAS via RcppArmadillo):
Random sampling:
compute_r2_sparse_cpp()computes all pairs withinmax_distin a single C++ call on the compact submatrix, replacing an R pair-by-pair loop.n_threadsis forwarded to enable OpenMP parallelism.Sliding window: column preparation is done once per window via
apply(); within-windowstats::cor()is used (windows are small, typically 50 SNPs).Parametric threshold:
col_r2_cpp()computes cross-chromosome r\(^2\) in Armadillo BLAS.
Memory-efficient backend access:
For GDS and bigmemory backends the function never loads a full chromosome.
Instead it samples pair indices from SNP positions (no genotype I/O),
collects only the unique SNP indices involved, and calls
read_chunk() once for that compact set. For 50,000 random pairs
on a 500,000-SNP chromosome with 500 individuals, peak RAM is
approximately 40 MB rather than 2 GB.
Important: method = "rV2" is not memory-safe at WGS scale.
The kinship whitening matrix requires a full-genome read of all SNPs once
before per-chromosome processing begins. For panels with millions of markers
this can be the dominant memory step. Use method = "r2" for LD
decay estimation; reserve method = "rV2" for block detection.
Sampling strategies:
"sliding_window"Computes r\(^2\) within a moving window of
window_snpsSNPs, reading only those columns per step. Uses an adaptive stride: the window is placed at approximatelyn_pairs / (window_snps*(window_snps-1)/2)evenly-spaced positions across the chromosome, so total pairs stays nearn_pairsregardless of marker density. Forwindow_snps=50on a 500k-SNP chromosome withn_pairs=50000: stride=12,195, only 41read_chunkcalls instead of 10,000. A density guard automatically switches chromosomes with > 200,000 SNPs to"random"sampling (consecutive SNPs at WGS density are always in near- perfect LD, making the sliding-window curve uninformative). Matches the TASSEL approach (Bradbury et al. 2007); recommended for chip-density panels (< 200k SNPs per chromosome)."random"Randomly samples up to
n_pairsSNP pairs per chromosome withinmax_distbp, then loads only the unique SNPs involved. Bounded RAM regardless of panel density; recommended for WGS panels (> 500k SNPs per chromosome)."both"Sliding window for the decay curve shape; random sampling for the parametric threshold estimation.
Critical r\(^2\) threshold:
- Fixed numeric (e.g.
0.1) Standard value used in most GWAS papers. The distance where the decay curve crosses this value defines the LD block extent for candidate gene searches.
"parametric"95th percentile of r\(^2\) between markers on different chromosomes (unlinked markers). In a structured population, unlinked r\(^2\) > 0 due to kinship. This threshold captures the background LD attributable to structure, above which LD is genuine. A high parametric threshold (> 0.05) is a strong indicator that
method = "rV2"should be used for block detection."both"Returns both; uses the parametric as the active threshold for decay distance estimation.
Decay model: The Hill-Weir (1988) expectation relates expected r\(^2\) to distance \(d\) (in Mb) via \(C = 4 N_e r d\). LDxBlocks fits this per chromosome with C as the free parameter via nonlinear least squares. The LOESS smooth is a non-parametric alternative robust to non-standard decay shapes.
Connection to define_qtl_regions() and GWAS integration:
The output of compute_ld_decay() can be passed directly to
define_qtl_regions(ld_decay = decay) to replace
fixed block boundaries with chromosome-specific LD-aware windows.
- Without
ld_decay define_qtl_regions()asks which blocks CONTAIN a significant GWAS SNP? The search window equals the block boundary. GWAS hits in gaps between blocks or near block edges are missed.- With
ld_decay Asks which blocks are IN LD with a significant GWAS SNP? The search window is extended by the chromosome-specific decay distance on both sides. Adds columns
candidate_region_start,candidate_region_end, andcandidate_region_size_kbready for BioMart/Ensembl Plants candidate gene queries.
This directly feeds integrate_gwas_haplotypes(): the
GWAS biological-evidence layer (layer 1 of the three-evidence score)
awards a point to blocks that contain a GWAS hit. With
ld_decay, blocks near a hit (within the decay distance)
also receive the evidence point, making the three-evidence scoring
in rank_haplotype_blocks() more sensitive and biologically
accurate. The recommended workflow is therefore:
decay <- compute_ld_decay(be, ...)qtl <- define_qtl_regions(..., ld_decay = decay)ranked <- integrate_gwas_haplotypes(qtl, pred, ...)
Examples
# \donttest{
data(ldx_geno, package = "LDxBlocks")
data(ldx_snp_info, package = "LDxBlocks")
# Random sampling -- memory-safe for any panel density
decay <- compute_ld_decay(
geno = ldx_geno,
snp_info = ldx_snp_info,
sampling = "random",
r2_threshold = "both",
fit_model = "loess",
n_pairs = 5000L,
verbose = FALSE
)
decay$critical_r2_param # background kinship-induced LD
#> 95%
#> 0.03718883
decay$critical_r2_fixed # standard 0.1 threshold
#> [1] 0.1
decay$decay_dist # per-chromosome decay distances
#> CHR decay_dist_bp decay_dist_kb threshold_used r2_col_used censored
#> 1 1 39181 39.18 0.03718883 r2_loess FALSE
#> 2 2 50902 50.90 0.03718883 r2_loess FALSE
#> 3 3 29560 29.56 0.03718883 r2_loess FALSE
# Sliding window (TASSEL approach) + Hill-Weir model
decay_hw <- compute_ld_decay(
geno = ldx_geno,
snp_info = ldx_snp_info,
sampling = "sliding_window",
window_snps = 50L,
r2_threshold = 0.1,
fit_model = "nonlinear",
verbose = FALSE
)
# WGS backend -- only sampled columns are loaded per chromosome
if (FALSE) { # \dontrun{
be <- read_geno("my_wgs.vcf.gz")
decay_wgs <- compute_ld_decay(
geno = be,
sampling = "random",
n_pairs = 50000L,
max_dist = 5000000L,
r2_threshold = "both",
fit_model = "loess",
n_threads = 8L
)
close_backend(be)
} # }
# Pass to define_qtl_regions for chromosome-specific candidate windows
data(ldx_blocks, package = "LDxBlocks")
data(ldx_gwas, package = "LDxBlocks")
qtl <- define_qtl_regions(ldx_gwas, ldx_blocks, ldx_snp_info,
ld_decay = decay,
p_threshold = NULL,
trait_col = "trait")
qtl[, c("block_id", "lead_marker", "candidate_region_start",
"candidate_region_end", "candidate_region_size_kb")]
#> block_id lead_marker candidate_region_start candidate_region_end
#> 1 block_1_1000_25027 rs1005 0 44369
#> 2 block_1_81064_99022 rs1048 58031 136393
#> 3 block_1_155368_179371 rs1070 129829 208191
#> 4 block_2_1000_30023 rs2004 0 55303
#> 5 block_2_86236_105290 rs2050 49548 151352
#> 6 block_3_1000_19068 rs3004 0 33383
#> candidate_region_size_kb
#> 1 78.4
#> 2 78.4
#> 3 78.4
#> 4 101.8
#> 5 101.8
#> 6 59.1
# }