Prepare Genomic Prediction Inputs for External GBLUP Software
Source:R/haplotypes.R
prepare_gblup_inputs.RdAssembles the inputs required to fit a GBLUP model in external software (rrBLUP, sommer, ASReml-R, BGLR) and subsequently run the Tong et al. (2025) haplotype stacking pipeline. Returns the VanRaden GRM computed from the haplotype feature matrix, aligned with a user-supplied phenotype table.
Arguments
- hap_matrix
Numeric matrix (individuals x haplotype alleles) from
build_haplotype_feature_matrix.- pheno_df
Data frame of phenotypes. Must contain:
An ID column (set via
id_col, default"id"). Values must matchrownames(hap_matrix)exactly (case-sensitive, no leading/trailing spaces).One or more numeric trait columns (referenced via
trait_col). Column names are arbitrary.NAvalues in trait columns are allowed.
Minimal single-trait format:
id YLD G001 4.21 G002 3.87 G003 NA - id_col
Name of the individual ID column in
pheno_df. Default"id".- trait_col
Name of the trait column to extract as a numeric vector. Default
NULL– noy_vecis returned, only the aligned data frame.- bend
Logical. Add 0.001 to diagonal of G for positive-definiteness. Default
TRUE(recommended for mixed model solvers).
Value
A named list:
GVanRaden GRM (n x n), aligned to individuals present in both
hap_matrixandpheno_df.pheno_dfPhenotype data frame subset and reordered to match rows of
G.y_vecNamed numeric vector of the requested trait (only if
trait_colis supplied).NAvalues are preserved so the user can decide how to handle them (e.g. set toNAfor prediction-only individuals in a training/validation split).n_trainNumber of individuals with non-missing trait values.
n_predictNumber of individuals with missing trait values (prediction candidates).
Workflow
LDxBlocks handles genotype processing and block detection. Phenotype handling and GBLUP fitting are intentionally left to dedicated R packages because phenotype data requires preprocessing (multi-environment adjustment, outlier removal, covariate inclusion) that is dataset-specific. The handoff is:
LDxBlocks (this function): produce aligned G matrix and phenotype vector.
External GBLUP (rrBLUP / sommer / ASReml-R / BGLR): fit the model, obtain GEBV.
LDxBlocks (
backsolve_snp_effects+compute_local_gebv): derive block-level haplotype effects from the GEBV.
Example GBLUP calls after this function
# rrBLUP
library(rrBLUP)
fit <- kin.blup(data = inp$pheno_df, geno = "id",
pheno = "trait", K = inp$G)
gebv <- fit$g
# sommer
library(sommer)
fit <- sommer::mmes(trait ~ 1, random = ~vsm(ism(id), Gu = inp$G),
data = inp$pheno_df)
gebv <- fit$U$`u:id`$trait
# BGLR
library(BGLR)
fit <- BGLR(y = inp$y_vec, ETA = list(list(K = inp$G, model = "RKHS")))
gebv <- fit$yHatReferences
Tong J et al. (2025). Haplotype stacking to improve stability of stripe rust resistance in wheat. Theoretical and Applied Genetics 138:267. doi:10.1007/s00122-025-05045-0
VanRaden PM (2008). Efficient methods to compute genomic predictions. Journal of Dairy Science 91(11):4414-4423. doi:10.3168/jds.2007-0980
Examples
if (FALSE) { # \dontrun{
# After building haplotype feature matrix:
feat <- build_haplotype_feature_matrix(haps, top_n = 5)$matrix
pheno <- read.csv("phenotypes.csv") # columns: id, YLD, PHT, ...
inp <- prepare_gblup_inputs(feat, pheno, id_col = "id",
trait_col = "YLD")
# Fit GBLUP with rrBLUP
library(rrBLUP)
fit <- kin.blup(data = inp$pheno_df, geno = "id",
pheno = "YLD", K = inp$G)
gebv <- fit$g
# Then derive block-level haplotype effects
snp_fx <- backsolve_snp_effects(geno_matrix, gebv, G = inp$G)
loc <- compute_local_gebv(geno_matrix, snp_info, blocks, snp_fx)
} # }