OptiSparseMET

Overview

OptiSparseMET is an R framework for constructing sparse multi-environment trials (METs) that jointly addresses treatment allocation across environments and field design within environments – under shared statistical, genetic, and logistical constraints.

The package targets a structural challenge common to modern breeding programs: the number of candidate lines routinely exceeds what any single environment can accommodate, environments are heterogeneous rather than interchangeable, and seed availability imposes limits that purely theoretical designs ignore. Standard MET tools typically address one of these problems at a time. OptiSparseMET integrates all of them within a single workflow.

Conceptual Framework

Sparse MET design is a two-level problem and should be treated as such.

Level 1 — Across-environment allocation determines which treatments appear in which environments, how many times each treatment is replicated across the trial, and whether the resulting incidence structure preserves sufficient genetic connectivity for valid cross-environment inference.

Level 2 — Within-environment design determines blocking structure, spatial layout, replication within each environment, and local control of field heterogeneity.

These two levels are not merely sequential steps — they are statistically coupled, and optimizing them independently produces inferior designs. The linkage operates through four mechanisms.

1. The incidence matrix couples both levels inside the information matrix.

In the linear mixed model: $y = X\beta + Zg + e$, the precision of all genetic value estimates is governed by the coefficient matrix: $C = Z^\top V^{-1} Z - Z^\top V^{-1} X(X^\top V^{-1} X)^{-1} X^\top V^{-1} Z$, where $V = ZKZ^\top \sigma_g^2 + R\sigma_e^2$. The allocation decision determines the sparsity pattern of $Z$ (which lines appear where); the within-environment blocking structure determines $R$ (the residual covariance). Both enter $V$ and therefore $C^{-1}$. Neither can be optimized in isolation because they interact inside the inversion of $V$.

2. Allocation fixes which within-environment designs are feasible.

Once allocation assigns $k_e$ lines to environment $e$, the within-environment design must arrange exactly those $k_e$ treatments across the available $n_{\text{rows}} \times n_{\text{cols}}$ field. If $k_e$ is incompatible with the blocking structure — for example, not a multiple of the target block size, or exceeding field capacity — the design is infeasible regardless of how statistically ideal the allocation was. Allocation and field geometry must be co-designed.

3. Block efficiency propagates into cross-environment inference.

The precision of a genetic value estimate for line $j$ in environment $e$ is proportional to $e_j \, r_j^{(e)}$, where $r_j^{(e)}$ is the number of plots and $e_j \in (0, 1]$ is the efficiency factor of the within-environment design relative to a completely randomized layout. A poor block design reduces $e_j$, inflating the variance of each BLUP. These inflated variances propagate into cross-environment covariance estimates, degrading G×E inference and genetic correlation estimation even when the allocation incidence structure is perfectly balanced.

4. CDmean — the genomic prediction criterion — depends on both levels.

The CDmean criterion, $\text{CDmean} = 1 - \overline{\text{PEV}} / \sigma_g^2$, where PEV depends on both $Z$ (allocation) and $R^{-1}$ (blocking), cannot be maximized by fixing either level independently. Spreading genetically diverse lines across environments improves the genomic connectivity captured in $Z^\top R^{-1} Z$; efficient blocking sharpens $R^{-1}$. Both contributions are necessary.

OptiSparseMET formalizes the link between the two levels: the allocation output specifies exactly which lines enter each environment, and the within-environment design engine receives precisely that set, ensuring that the incidence structure and the blocking structure are optimized consistently within the same statistical framework.

Statistical Foundations

Sparse testing identity

The theoretical basis follows Montesinos-Lopez et al. (2023), who formalize the resource identity underlying balanced sparse designs:

\[N = J \times r = I \times k\]

Symbol	Meaning
$J$	Total number of treatments
$I$	Number of environments
$k$	Number of treatments per environment
$r$	Number of environments per treatment

Given fixed total resources $N$, this identity makes the tradeoff between coverage breadth ($k$) and replication depth ($r$) explicit.

Allocation strategies

Two strategies are available, corresponding to M3 and M4 in Montesinos-Lopez et al. (2023):

Strategy	Argument	Properties
M3-inspired	`random_balanced`	Coverage-first stochastic allocation; guarantees every treatment appears at least once; tolerates unequal environment sizes
M4 BIBD	`balanced_incomplete`	Enforces equal replication and equal environment sizes (slot identity J* × r = I × k*); pairwise co-occurrence is computed and returned but not enforced

M4 – equal replication and equal environment sizes

The M4 method (Montesinos-Lopez et al. 2023) enforces two structural guarantees:

Equal replication — every non-common treatment appears in exactly $r$ environments.
Equal environment sizes — every environment receives exactly k* sparse treatments, enforcing the resource identity J* × r = I × k* exactly, where J* = J − C and k* = k − C (C = number of common treatments).

These are the guarantees that distinguish M4 from M3 in the paper. In plant breeding programs where thousands of lines are tested across a few environments, equal replication means every candidate is evaluated the same number of times – a fundamental fairness and precision requirement.

allow_approximate = FALSE (the default) enforces both conditions strictly. If the slot identity J* × r = I × k* does not hold for the chosen dimensions, the function stops with a clear error before any allocation is attempted. Use check_balanced_incomplete_feasibility() to verify the slot identity first, or adjust k and r so that the identity holds. Construction uses a greedy load-balanced constructor that guarantees equal replication.

allow_approximate = TRUE relaxes the slot identity and allows minor replication imbalances. This is a fallback for exploratory use, not the primary mode.

Genetic connectedness

OptiSparseMET addresses genetic disconnectedness through three mechanisms: common treatments forced into every environment establish model-free cross-environment connectivity; family-based allocation distributes each family group across environments; and GRM/A-based allocation uses genomic or pedigree relationships to prevent genetic clustering.

Seed constraints

assign_replication_by_seed() takes a data frame of available seed quantities and a per-plot seed requirement and returns a replication plan that respects those constraints – making designs deployable rather than merely theoretically optimal.

Main Functions

Across-environment allocation

Function	Description
`allocate_sparse_met()`	Distribute treatments across environments (M3 or M4) with enforced equal replication under M4
`check_balanced_incomplete_feasibility()`	Verify the slot identity J* × r = I × k* before attempting M4 allocation
`derive_allocation_groups()`	Derive grouping structure from family labels, GRM, or pedigree matrix

Feasibility and capacity helpers

Function	Description
`suggest_safe_k()`	Propose a safe `n_test_entries_per_environment` value
`min_k_for_full_coverage()`	Compute the minimum capacity for full treatment coverage
`warn_if_k_too_small()`	Non-fatal pre-flight capacity check

Call suggest_safe_k() or min_k_for_full_coverage() before allocate_sparse_met() whenever trial dimensions change.

Seed-aware replication

Function	Description
`assign_replication_by_seed()`	Partition treatments into replicated, unreplicated, and excluded roles based on seed availability

Within-environment field design

Function	Description
`met_prep_famoptg()`	Augmented, p-rep, and RCBD-type repeated-check block designs
`met_alpha_rc_stream()`	Alpha row-column stream designs for fixed-grid field deployment
`met_evaluate_famoptg_efficiency()`	A/D/CDmean efficiency evaluation for `met_prep_famoptg()` designs
`met_evaluate_alpha_efficiency()`	A/D/CDmean efficiency evaluation for `met_alpha_rc_stream()` designs
`met_optimize_famoptg()`	Random Restart optimisation for `met_prep_famoptg()` designs
`met_optimize_alpha_rc()`	RS/SA/GA optimisation for `met_alpha_rc_stream()` designs

Pipeline and assembly

Function	Description
`plan_sparse_met_design()`	End-to-end two-stage pipeline in a single call
`combine_met_fieldbooks()`	Stack environment-level field books into one MET field book

Workflow

STEP 0  Verify capacity
        suggest_safe_k()  OR  min_k_for_full_coverage()
        |
STEP 1  Allocate treatments across environments
        allocate_sparse_met()
        |
STEP 2  Define replication based on seed availability
        assign_replication_by_seed()
        |
STEP 3  Build within-environment field designs
        met_prep_famoptg()  OR  met_alpha_rc_stream()
        |
STEP 4  Assemble the combined MET field book
        combine_met_fieldbooks()

Or run the entire pipeline in one call: plan_sparse_met_design().

5.5 Pipeline inputs: required and optional

Before running any pipeline function, it helps to know exactly what each function needs. The tables below list every input for the four main pipeline functions, distinguishing what is strictly required from what is optional.

`allocate_sparse_met()` — across-environment allocation

Required inputs

Argument	Type	Description
`treatments`	character vector	All candidate line IDs (J total)
`environments`	character vector	Environment names (≥ 2)
`allocation_method`	character	`"random_balanced"` (M3) or `"balanced_incomplete"` (M4)
`n_test_entries_per_environment`	integer	Total entries per environment including common treatments (k)

Optional inputs

Argument	Default	Description
`target_replications`	inferred	Target environments per sparse line (r); computed from slot identity if NULL
`common_treatments`	none	Lines forced into every environment before sparse allocation
`allow_approximate`	`FALSE`	`FALSE` = strict equal replication; `TRUE` = relaxed fallback
`allocation_group_source`	`"none"`	Genetic grouping: `"Family"`, `"GRM"`, or `"A"`
`treatment_info`	NULL	Data frame with `Treatment` and `Family` columns (required when `allocation_group_source = "Family"`)
`GRM`	NULL	Genomic relationship matrix (required when `allocation_group_source = "GRM"`)
`A`	NULL	Pedigree relationship matrix (required when `allocation_group_source = "A"`)
`min_groups_per_environment`	NULL	Minimum genetic groups per environment
`min_env_per_group`	NULL	Minimum environments per genetic group
`seed`	NULL	Integer seed for reproducibility

`assign_replication_by_seed()` — seed-aware replication

Required inputs

Argument	Type	Description
`treatments`	character vector	All candidate line IDs
`seed_available`	data frame	Must contain `Treatment` and `SeedAvailable` columns
`seed_required_per_plot`	integer	Seeds needed per plot (scalar or named vector per environment)
`replication_mode`	character	`"augmented"`, `"p_rep"`, or `"rcbd_type"`

Optional inputs

Argument	Default	Description
`desired_replications`	2	Target number of plots per replicated line
`shortage_action`	`"downgrade"`	What to do when seed is insufficient: `"downgrade"`, `"exclude"`, or `"error"`
`max_prep`	NULL	Maximum number of p-rep treatments (p-rep mode only)
`priority`	`"seed_available"`	Selection criterion for p-rep candidates
`minimum_seed_buffer`	0	Extra seeds reserved per line beyond the plot requirement
`seed`	NULL	Integer seed for reproducibility

`met_prep_famoptg()` — block-based field design

Required inputs

Argument	Type	Description
`check_treatments`	character vector	Check (control) treatment IDs; appear in every block
`check_families`	character vector	Family labels for checks; same length as `check_treatments`
`n_blocks`	integer	Number of incomplete blocks
`n_rows`	integer	Number of field rows
`n_cols`	integer	Number of field columns

At least one of p_rep_treatments or unreplicated_treatments must be supplied.

Optional inputs

Argument	Default	Description
`p_rep_treatments`	NULL	Treatments to replicate; typically `rep_plan$p_rep_treatments`
`p_rep_reps`	NULL	Replication count per p-rep line; typically `rep_plan$p_rep_reps`
`p_rep_families`	NULL	Family labels for p-rep treatments
`unreplicated_treatments`	NULL	Treatments to appear once; typically `rep_plan$unreplicated_treatments`
`unreplicated_families`	NULL	Family labels for unreplicated treatments
`replication_mode`	`"p_rep"`	`"p_rep"`, `"augmented"`, or `"rcbd_type"`
`cluster_source`	`"none"`	Genetic dispersion grouping: `"none"`, `"Family"`, `"GRM"`, `"A"`
`eval_efficiency`	`FALSE`	Compute A, D, CDmean efficiency metrics
`order`	`"row"`	Plot traversal order: `"row"`, `"col"`, or `"serpentine"`
`seed`	NULL	Integer seed for reproducibility

`met_alpha_rc_stream()` — row-column alpha design

Required inputs

Argument	Type	Description
`check_treatments`	character vector	Check treatment IDs; appear in every incomplete block
`check_families`	character vector	Family labels for checks
`entry_treatments`	character vector	Entry (non-check) treatment IDs
`entry_families`	character vector	Family labels for entries
`n_reps`	integer	Number of field replicates
`n_rows`	integer	Number of field rows
`n_cols`	integer	Number of field columns

Optional inputs

Argument	Default	Description
`min_block_size`	6	Minimum entries (excluding checks) per incomplete block
`max_block_size`	NULL	Maximum entries per incomplete block
`cluster_source`	`"none"`	Genetic dispersion grouping: `"none"`, `"Family"`, `"GRM"`, `"A"`
`eval_efficiency`	`FALSE`	Compute A, D, CDmean efficiency metrics
`order`	`"row"`	Plot traversal order: `"row"`, `"col"`, or `"serpentine"`
`serpentine`	`FALSE`	Reverse alternating rows/columns for physical continuity
`seed`	NULL	Integer seed for reproducibility

Minimum working example

The absolute minimum to run the full pipeline from allocation to field book:

library(OptiSparseMET)

## Minimum inputs: just lines, environments, and field dimensions
treatments <- paste0("L", sprintf("%03d", 1:120))
envs       <- c("E1", "E2", "E3", "E4")

## Stage 0: verify k
k <- suggest_safe_k(treatments, envs, buffer = 3)  # 33

## Stage 1: M3 allocation (no common treatments, no grouping)
alloc <- allocate_sparse_met(
  treatments                     = treatments,
  environments                   = envs,
  allocation_method              = "random_balanced",
  n_test_entries_per_environment = k,
  seed                           = 1
)

## Stage 2: seed plan (uniform seed, no shortage)
seed_df <- data.frame(
  Treatment     = treatments,
  SeedAvailable = 100L
)
rep_plan <- assign_replication_by_seed(
  treatments             = treatments,
  seed_available         = seed_df,
  seed_required_per_plot = 10L,
  replication_mode       = "augmented"
)

## Stage 3: within-environment design (checks + unreplicated entries)
design <- met_prep_famoptg(
  check_treatments        = c("CHK1", "CHK2"),
  check_families          = c("CHECK", "CHECK"),
  unreplicated_treatments = rep_plan$unreplicated_treatments,
  unreplicated_families   = rep("F1", length(rep_plan$unreplicated_treatments)),
  n_blocks = 4L, n_rows = 10L, n_cols = 12L,
  seed     = 1
)

## Stage 4: combine
met_book <- combine_met_fieldbooks(
  field_books = list(E1 = design$field_book)
)

Quick Start

library(OptiSparseMET)

treatments <- paste0("L", sprintf("%03d", 1:120))
envs       <- c("E1", "E2", "E3", "E4")
common     <- treatments[1:10]    # 10 common treatments -> J* = 110

# Step 0: verify minimum capacity before allocating
# J=120, C=10, I=4: J*=110, min k* = ceil(110/4) = 28, min k = 38
# suggest_safe_k adds a buffer of 3 on top of the minimum
suggest_safe_k(treatments, envs,
               common_treatments = common,
               buffer = 3)           # returns 41

# Before M4 allocation, verify the slot identity holds.
# With k=65, r=2, C=10: k*=55, J*=110. Identity: 110*2 = 4*55 = 220. Holds.
check_balanced_incomplete_feasibility(
  n_treatments_total             = 120,
  n_environments                 = 4,
  n_test_entries_per_environment = 65,
  target_replications            = 2,
  n_common_treatments            = 10
)
# feasible = TRUE: 110*2 = 4*55 = 220 (difference = 0)

# Step 1a: M3-inspired random balanced allocation
alloc_m3 <- allocate_sparse_met(
  treatments                     = treatments,
  environments                   = envs,
  allocation_method              = "random_balanced",
  n_test_entries_per_environment = 41,
  target_replications            = 1,
  common_treatments              = common,
  seed                           = 123
)

alloc_m3$summary$min_sparse_replication  # every treatment in >= 1 environment
alloc_m3$summary$mean_sparse_replication

# Step 1b: M4 BIBD allocation -- slot identity must hold exactly.
# J*=110, I=4, r=2: k* = 110*2/4 = 55, k = 55+10 = 65.
# Slot identity: 110*2 = 4*55 = 220. Satisfied.
alloc_m4 <- allocate_sparse_met(
  treatments                     = treatments,
  environments                   = envs,
  allocation_method              = "balanced_incomplete",
  n_test_entries_per_environment = 65,
  target_replications            = 2,
  common_treatments              = common,
  allow_approximate              = FALSE,
  seed                           = 123
)

alloc_m4$summary$min_sparse_replication  # 2: every sparse treatment in exactly 2 envs
alloc_m4$summary$max_sparse_replication  # 2: equal replication confirmed

# Step 2: seed-aware replication plan
seed_df <- data.frame(
  Treatment     = treatments,
  SeedAvailable = sample(10:100, length(treatments), replace = TRUE)
)

rep_plan <- assign_replication_by_seed(
  treatments             = treatments,
  seed_available         = seed_df,
  seed_required_per_plot = 10,
  replication_mode       = "p_rep",
  desired_replications   = 2,
  max_prep               = 15,
  shortage_action        = "downgrade"
)

rep_plan$p_rep_treatments        # treatments receiving 2 plots
rep_plan$unreplicated_treatments # treatments receiving 1 plot
rep_plan$excluded_treatments     # treatments with insufficient seed

# Step 3: within-environment field design
# For augmented / p-rep / RCBD-type block designs:
design_E1 <- met_prep_famoptg(
  check_treatments        = c("CHK1", "CHK2"),
  check_families          = c("CHECK", "CHECK"),
  p_rep_treatments        = rep_plan$p_rep_treatments,
  p_rep_reps              = rep_plan$p_rep_reps,
  p_rep_families          = rep("F1", length(rep_plan$p_rep_treatments)),
  unreplicated_treatments = rep_plan$unreplicated_treatments,
  unreplicated_families   = rep("F1", length(rep_plan$unreplicated_treatments)),
  n_blocks = 4L, n_rows = 10L, n_cols = 12L,
  seed     = 123
)

# For alpha row-column designs:
design_E2 <- met_alpha_rc_stream(
  check_treatments = c("CHK1", "CHK2"),
  check_families   = c("CHECK", "CHECK"),
  entry_treatments = rep_plan$p_rep_treatments,
  entry_families   = rep("F1", length(rep_plan$p_rep_treatments)),
  n_reps = 2L, n_rows = 8L, n_cols = 10L,
  seed   = 123
)

# Step 4: combine into single MET field book
met_book <- combine_met_fieldbooks(
  field_books       = list(E1 = design_E1$field_book,
                           E2 = design_E2$field_book),
  local_designs     = c(E1 = "met_prep_famoptg",
                        E2 = "met_alpha_rc_stream"),
  replication_modes = c(E1 = "p_rep", E2 = "met_alpha_rc_stream"),
  sparse_method     = "random_balanced",
  common_treatments = common
)

head(met_book[, 1:8])

End-to-end pipeline

For standard workflows, plan_sparse_met_design() handles steps 1–4 in a single call using env_design_specs, a named list where each environment is mapped to its local design arguments. Set design = "met_prep_famoptg" or design = "met_alpha_rc_stream" in each spec:

env_specs <- list(
  E1 = list(
    design               = "met_prep_famoptg",
    replication_mode     = "p_rep",
    desired_replications = 2L,
    max_prep             = 15L,
    shortage_action      = "downgrade",
    check_treatments     = c("CHK1", "CHK2"),
    check_families       = c("CHECK", "CHECK"),
    n_blocks = 4L, n_rows = 10L, n_cols = 12L
  ),
  E2 = list(
    design           = "met_alpha_rc_stream",
    check_treatments = c("CHK1", "CHK2"),
    check_families   = c("CHECK", "CHECK"),
    n_reps = 2L, n_rows = 8L, n_cols = 10L
  )
)

out <- plan_sparse_met_design(
  treatments                     = treatments,
  environments                   = envs[1:2],
  allocation_method              = "random_balanced",
  n_test_entries_per_environment = 41,
  target_replications            = 1,
  common_treatments              = common,
  env_design_specs               = env_specs,
  seed_info                      = seed_df,
  seed_required_per_plot         = data.frame(
    Environment         = envs[1:2],
    SeedRequiredPerPlot = c(10, 10)
  ),
  seed = 123
)

out$combined_field_book  # full MET field book
out$environment_summary  # per-environment design summary
out$efficiency_summary   # efficiency metrics (when eval_efficiency = TRUE)

6.4 Slot identity feasibility by J*, I, and r

The slot identity J* × r = I × k* requires that J* × r be exactly divisible by I. Whether this is achievable for a given combination of sparse treatments (J*), environments (I), and replication (r) depends on the shared factors of these three numbers.

The divisibility rule

For the slot identity to hold, I must divide J* × r exactly. Every prime factor of I that is absent from J* must be supplied by r. This has a practical consequence for the most common case in plant breeding:

I = 4 environments, J* odd: J* × 1 = odd (not divisible by 4); J* × 2 = 2 × odd (divisible by 2 but not by 4 for odd J); only r = 4 guarantees divisibility. But r = 4 gives k = J* — full replication — which defeats the purpose of sparse testing. Practical fix: adjust C by 1 so that J* = J − C becomes even.
I = 4 environments, J* even but not divisible by 4: r = 2 always works (e.g. J* = 110: 110 × 2 / 4 = 55).
I = 3 environments: feasibility depends on divisibility by 3. If J* is divisible by 3, any r works. Otherwise r must be a multiple of 3.
I = 6 environments: requires divisibility by 2 × 3 = 6. Odd J* not divisible by 3 requires r divisible by 6.

Feasibility table: r = 2

The table shows k* when the slot identity holds, and -- when it does not for that combination. Add C (common treatments) to k* to get the n_test_entries_per_environment argument.

$J^{*}$	$I=3$	$I=4$	$I=5$	$I=6$	$I=7$	$I=8$	$I=9$	$I=10$
60	40	30	24	20	–	15	–	12
70	–	35	28	–	20	–	–	14
75	50	–	30	25	–	–	–	15
80	–	40	32	–	–	20	–	16
90	60	45	36	30	–	–	20	18
100	–	50	40	–	–	25	–	20
110	–	55	44	–	–	–	–	22
112	–	56	–	–	32	28	–	–
120	80	60	48	40	–	30	–	24
150	100	75	60	50	–	–	–	30
200	–	100	80	–	–	50	–	40

Feasibility table: $r = 3$

$J^{*}$	$I=3$	$I=4$	$I=5$	$I=6$	$I=7$	$I=8$	$I=9$	$I=10$
60	60	45	36	30	–	–	20	18
70	70	–	42	35	30	–	–	21
75	75	–	45	–	–	–	25	–
80	80	60	48	40	–	30	–	24
90	90	–	54	45	–	–	30	27
100	100	75	60	50	–	–	–	30
110	110	–	66	55	–	–	–	33
112	112	84	–	56	48	42	–	–
120	120	90	72	60	–	45	40	36
150	150	–	90	75	–	–	50	45
200	200	150	120	100	–	75	–	60

What to do when your combination gives `--`

Use check_balanced_incomplete_feasibility() to diagnose the problem and try one of these adjustments:

Adjust $C$ by 1: adding or removing one common treatment changes $J^{*}$ by 1, which may make it divisible by $I$ for the chosen $r$.
Try $r = 2$ instead of $r = 1$, or $r = 3$ instead of $r = 2$ — the extra factor may resolve the divisibility.
Use random_balanced (M3) if exact equal replication is not essential. M3 does not require the slot identity and tolerates odd $J^{*}$ freely.
Use allow_approximate = TRUE as a fallback — the allocation proceeds with the closest possible balance, accepting minor replication differences.

## Quick check: is your combination feasible?
## J* = 75 (odd), I = 4, r = 2 -- should give --
check_balanced_incomplete_feasibility(
  n_treatments_total             = 83,   # J = J* + C = 75 + 8
  n_environments                 = 4,
  n_test_entries_per_environment = 30,   # k* guess: 30 - 8 = 22, 4*22=88 != 75*2=150
  target_replications            = 2,
  n_common_treatments            = 8
)
## feasible = FALSE -> adjust

## Fix: change C from 8 to 9 -> J* = 74 (even), r=2: 74*2/4 = 37
check_balanced_incomplete_feasibility(
  n_treatments_total             = 83,
  n_environments                 = 4,
  n_test_entries_per_environment = 46,   # k* = 37, k = 37+9 = 46
  target_replications            = 2,
  n_common_treatments            = 9
)
## feasible = TRUE

Design Strategy Notes

Use random_balanced when environment capacities differ substantially, when exact BIBD parameters are not achievable, or when some stochasticity in allocation is acceptable. Unlike the original M3 of Montesinos-Lopez et al. (2023), this implementation guarantees every treatment appears in at least one environment before replication filling begins.

Use balanced_incomplete with allow_approximate = FALSE (the default) when equal replication is a hard requirement – every sparse treatment must appear in exactly r environments. Verify the slot identity J* × r = I × k* first with check_balanced_incomplete_feasibility(). The function stops with a clear error if the identity does not hold, so you always know whether the equal-replication guarantee was met.

Use balanced_incomplete with allow_approximate = TRUE as a fallback when the slot identity cannot be satisfied for the chosen dimensions but you still want to attempt a balanced allocation. Some lines will receive more or fewer replications than r. This is an exploratory mode, not the primary path.

Use met_prep_famoptg() for designs requiring repeated checks in every block, partial replication (p-rep), or RCBD-type structure. The optimizer met_optimize_famoptg() accepts A, D, and CDmean criteria.

Use met_alpha_rc_stream() for fixed-grid field deployments where spatial row-column control is a priority. The optimizer met_optimize_alpha_rc() supports Random Restart, Simulated Annealing, and Genetic Algorithm methods.

Use GRM/A-based grouping when genomic prediction is a primary objective, when family labels are too coarse to capture relevant genetic structure, or when related lines risk clustering into the same environments.

Include common treatments whenever environments are weakly correlated, when genetic connectivity cannot otherwise be guaranteed, or when a stable reference set is needed for cross-environment benchmarking.

Installation

Install from GitHub with vignettes (recommended):

install.packages("remotes")
remotes::install_github("FAkohoue/OptiSparseMET",
  build_vignettes = TRUE,
  dependencies    = TRUE
)

Install without vignettes for a faster install:

remotes::install_github("FAkohoue/OptiSparseMET",
  build_vignettes = FALSE,
  dependencies    = TRUE
)

Documentation

Full documentation, function reference, and tutorials are available at:

https://FAkohoue.github.io/OptiSparseMET/

To read the vignette after installation:

vignette("OptiSparseMET-introduction", package = "OptiSparseMET")

Citation

If you use OptiSparseMET in published research, please cite:

Akohoue, F. (2026).
OptiSparseMET: Sparse Multi-Environment Trial Design with Flexible Local
Field Layout. R package version 0.1.0.
https://github.com/FAkohoue/OptiSparseMET

Reference

Montesinos-Lopez O.A., Mosqueda-Gonzalez B.A., Salinas-Ruiz J., Montesinos-Lopez A., Crossa J. (2023). Sparse multi-trait genomic prediction under balanced incomplete block design. The Plant Genome, 16, e20305. https://doi.org/10.1002/tpg2.20305

Contributing

Issues, bug reports, and feature suggestions are welcome: https://github.com/FAkohoue/OptiSparseMET/issues

License

MIT License (c) Félicien Akohoue

Symbol	Meaning
\(J\)	Total number of treatments
\(I\)	Number of environments
\(k\)	Number of treatments per environment
\(r\)	Number of environments per treatment

\(J^{*}\)	\(I=3\)	\(I=4\)	\(I=5\)	\(I=6\)	\(I=7\)	\(I=8\)	\(I=9\)	\(I=10\)
60	40	30	24	20	–	15	–	12
70	–	35	28	–	20	–	–	14
75	50	–	30	25	–	–	–	15
80	–	40	32	–	–	20	–	16
90	60	45	36	30	–	–	20	18
100	–	50	40	–	–	25	–	20
110	–	55	44	–	–	–	–	22
112	–	56	–	–	32	28	–	–
120	80	60	48	40	–	30	–	24
150	100	75	60	50	–	–	–	30
200	–	100	80	–	–	50	–	40

\(J^{*}\)	\(I=3\)	\(I=4\)	\(I=5\)	\(I=6\)	\(I=7\)	\(I=8\)	\(I=9\)	\(I=10\)
60	60	45	36	30	–	–	20	18
70	70	–	42	35	30	–	–	21
75	75	–	45	–	–	–	25	–
80	80	60	48	40	–	30	–	24
90	90	–	54	45	–	–	30	27
100	100	75	60	50	–	–	–	30
110	110	–	66	55	–	–	–	33
112	112	84	–	56	48	42	–	–
120	120	90	72	60	–	45	40	36
150	150	–	90	75	–	–	50	45
200	200	150	120	100	–	75	–	60

\(J^{*}\)	\(I=3\)	\(I=4\)	\(I=5\)	\(I=6\)	\(I=7\)	\(I=8\)	\(I=9\)	\(I=10\)
60	40	30	24	20	–	15	–	12
70	–	35	28	–	20	–	–	14
75	50	–	30	25	–	–	–	15
80	–	40	32	–	–	20	–	16
90	60	45	36	30	–	–	20	18
100	–	50	40	–	–	25	–	20
110	–	55	44	–	–	–	–	22
112	–	56	–	–	32	28	–	–
120	80	60	48	40	–	30	–	24
150	100	75	60	50	–	–	–	30
200	–	100	80	–	–	50	–	40

\(J^{*}\)	\(I=3\)	\(I=4\)	\(I=5\)	\(I=6\)	\(I=7\)	\(I=8\)	\(I=9\)	\(I=10\)
60	60	45	36	30	–	–	20	18
70	70	–	42	35	30	–	–	21
75	75	–	45	–	–	–	25	–
80	80	60	48	40	–	30	–	24
90	90	–	54	45	–	–	30	27
100	100	75	60	50	–	–	–	30
110	110	–	66	55	–	–	–	33
112	112	84	–	56	48	42	–	–
120	120	90	72	60	–	45	40	36
150	150	–	90	75	–	–	50	45
200	200	150	120	100	–	75	–	60

Overview

Conceptual Framework

Statistical Foundations

Sparse testing identity

Allocation strategies

M4 – equal replication and equal environment sizes

Genetic connectedness

Seed constraints

Main Functions

Across-environment allocation

Feasibility and capacity helpers

Seed-aware replication

Within-environment field design

Pipeline and assembly

Workflow

5.5 Pipeline inputs: required and optional

allocate_sparse_met() — across-environment allocation

assign_replication_by_seed() — seed-aware replication

met_prep_famoptg() — block-based field design

met_alpha_rc_stream() — row-column alpha design

Minimum working example

Quick Start

End-to-end pipeline

6.4 Slot identity feasibility by J*, I, and r

The divisibility rule

Feasibility table: r = 2

Feasibility table: \(r = 3\)

What to do when your combination gives --

Design Strategy Notes

Installation

Documentation

Citation

Reference

Contributing

License

`allocate_sparse_met()` — across-environment allocation

`assign_replication_by_seed()` — seed-aware replication

`met_prep_famoptg()` — block-based field design

`met_alpha_rc_stream()` — row-column alpha design

What to do when your combination gives `--`

\(J^{*}\)	\(I=3\)	\(I=4\)	\(I=5\)	\(I=6\)	\(I=7\)	\(I=8\)	\(I=9\)	\(I=10\)
60	40	30	24	20	–	15	–	12
70	–	35	28	–	20	–	–	14
75	50	–	30	25	–	–	–	15
80	–	40	32	–	–	20	–	16
90	60	45	36	30	–	–	20	18
100	–	50	40	–	–	25	–	20
110	–	55	44	–	–	–	–	22
112	–	56	–	–	32	28	–	–
120	80	60	48	40	–	30	–	24
150	100	75	60	50	–	–	–	30
200	–	100	80	–	–	50	–	40

\(J^{*}\)	\(I=3\)	\(I=4\)	\(I=5\)	\(I=6\)	\(I=7\)	\(I=8\)	\(I=9\)	\(I=10\)
60	60	45	36	30	–	–	20	18
70	70	–	42	35	30	–	–	21
75	75	–	45	–	–	–	25	–
80	80	60	48	40	–	30	–	24
90	90	–	54	45	–	–	30	27
100	100	75	60	50	–	–	–	30
110	110	–	66	55	–	–	–	33
112	112	84	–	56	48	42	–	–
120	120	90	72	60	–	45	40	36
150	150	–	90	75	–	–	50	45
200	200	150	120	100	–	75	–	60