Creates a big.matrix on disk from any supported genotype source and
wraps it in the standard LDxBlocks_backend interface. Subsequent
calls to read_chunk() retrieve columns via memory-mapped I/O –
the OS pages in only the requested bytes on demand, so peak RAM is
proportional to the columns accessed, not the full matrix.
This is useful when:
The filtered genotype matrix (individuals x SNPs) is too large to hold in RAM (> 4-8 GB for typical server configurations).
The pipeline needs to be restarted after a previous run – the
.binand.descfiles persist on disk and can be reused without re-loading the source VCF/GDS.Multiple R sessions or future workers need simultaneous read access to the same matrix (bigmemory's file-backed store is safe for concurrent reads).
Arguments
- source
Either an
LDxBlocks_backendobject (any format), a plain R matrix, or a path to a previously saved.descfile (reattach without reloading).- snp_info
Data frame with
SNP,CHR,POS. Required whensourceis a plain matrix or a.descfile (bigmemory does not store metadata). Optional and ignored whensourceis a file path or anLDxBlocks_backend– SNP info is obtained automatically from those sources.- backingfile
Character. Stem for the
.binand.descfiles. Default: a tempfile. Supply a persistent path to reuse across sessions.- backingpath
Character. Directory for backing files. Default: tempdir().
- type
Storage type:
"char"(1 byte per cell, values 0-2 fit, saves 8x vs double),"short"(2 bytes), or"double". Default"char".- verbose
Logical. Default
TRUE.
Value
An LDxBlocks_backend object with type = "bigmemory".
Use read_chunk(be, col_idx) and close_backend(be) as normal.
Memory model
big.matrix stores the matrix as a raw binary file (.bin) with
a companion descriptor (.desc). The OS memory-maps the file:
read_chunk(be, col_idx) calls bigmemory::as.matrix(bm[, col_idx])
which triggers page faults that load only the requested column pages.
This is equivalent to the GDS streaming model but works for any input format
and avoids repeated snpgdsGetGeno() calls.
Examples
if (FALSE) { # \dontrun{
# Convert a GDS backend to a persistent bigmemory store
be_gds <- read_geno("mydata.gds")
be_bm <- read_geno_bigmemory(be_gds,
backingfile = "mydata_bm",
backingpath = "/data/ldxblocks")
close_backend(be_gds)
# All subsequent runs reattach without reloading
be_bm2 <- read_geno_bigmemory("/data/ldxblocks/mydata_bm.desc")
blocks <- run_Big_LD_all_chr(be_bm2, CLQcut = 0.70)
close_backend(be_bm2)
# From a plain matrix
be_bm <- read_geno_bigmemory(ldx_geno, snp_info = ldx_snp_info)
} # }