Fast C++/Armadillo implementation of the standard pairwise squared Pearson
correlation (r^2) for a window of SNP columns. Missing genotypes (NA) are
mean-imputed per column before computation. This is the default LD metric
in LDxBlocks and is 10-50x faster than stats::cor().
When to use r^2 vs rV^2
- r^2
Use for large unstructured datasets (> 500 k markers), random mating populations, or whenever speed matters. The standard estimator is inflated in related populations (i.e. will over-estimate LD) but this usually leads to slightly more conservative (larger) blocks rather than catastrophically wrong ones.
- rV^2
Use for highly structured / related populations (livestock, inbred lines, family-based human cohorts) where kinship inflation would meaningfully distort block boundaries. Requires computing and inverting the GRM - prohibitive beyond ~5 k individuals.