Builds per-block haplotype dosage strings for all individuals across the
LD blocks in blocks. Each block is processed by the C++ engine
extract_chr_haplotypes_cpp() (unphased) or
extract_chr_haplotypes_phased_cpp() (phased VCF input), which
assigns each individual a dosage string of 0/1/2 characters (one per SNP
in the block) and identifies the top haplotype alleles by frequency.
Arguments
- geno
One of:
An
LDxBlocks_backendfromread_genoorread_geno_bigmemory(streaming, one chromosome at a time).A named list with elements
hap1andhap2(phased SNPs x individuals matrices fromread_phased_vcf).A numeric matrix (individuals x SNPs, values 0/1/2/NA).
- snp_info
Data frame with columns
SNP,CHR,POS.- blocks
Data frame of LD blocks from
run_Big_LD_all_chr, with columnsCHR,start.bp,end.bp,n_snps.- chr
Character vector of chromosomes to process.
NULL(default) processes all chromosomes present inblocks.- min_snps
Integer. Minimum number of SNPs a block must contain to be included. Default
3L.- na_char
Character. Symbol used to denote missing genotype in the dosage string. Default
".".
Value
A named list of per-block haplotype dosage matrices (individuals x
haplotype alleles, values 0/1/2 for phased data or 0/1 for unphased).
The list carries a block_info attribute (data frame with one row
per block: block_id, CHR, start_bp, end_bp,
n_snps, n_haplotypes, phased).