Search for a criterion-optimal alpha-lattice design

optimize_alpha_rc() wraps alpha_rc_stream() and evaluate_alpha_efficiency() in an optimisation loop that searches for the design with the best optimality criterion value among many candidate randomisations. Three search strategies are available:

RS (Random Restart): generate n_restarts independent designs and return the best by criterion value.
SA (Simulated Annealing): start from a random design and iteratively propose entry permutation swaps, accepting improvements always and degradations with probability $\exp(-\Delta / T)$, where $T$ cools from sa_temp_start to sa_temp_end over sa_max_iter iterations. Repeated across n_restarts independent restarts.
GA (Genetic Algorithm): maintain a population of ga_pop_size entry permutations and evolve it using tournament selection, Order Crossover (OX1), random swap mutation, and elitism.

Constraint preservation: all candidate designs are generated by alpha_rc_stream(), so family/cluster adjacency structure, genetic dispersion, check placement, and alpha-lattice block balance are preserved by construction in all three methods - no penalty terms are required.

Integrity guarantee: every candidate design passes a structural integrity check before being scored or stored as the best. The running best is only updated when both the criterion score improves and a re-check passes. A final integrity check is performed on the stored best before returning to the user. If no valid design is found after all iterations, an emergency fallback returns a single freshly constructed valid design.

Usage

optimize_alpha_rc(
  check_treatments,
  check_families,
  entry_treatments,
  entry_families,
  n_reps,
  n_rows,
  n_cols,
  order = "column",
  serpentine = FALSE,
  seed = NULL,
  attempts = 5000,
  warn_and_correct = TRUE,
  fix_rows = TRUE,
  cluster_source = c("Family", "GRM", "A"),
  GRM = NULL,
  A = NULL,
  id_map = NULL,
  cluster_method = c("kmeans", "hclust"),
  cluster_seed = 1,
  cluster_attempts = 25,
  n_pcs_use = Inf,
  n_blocks_per_rep = NULL,
  min_block_size = NULL,
  max_block_size = NULL,
  check_placement = c("systematic", "random"),
  check_position_pattern = c("spread", "corners_first"),
  use_dispersion = FALSE,
  dispersion_source = c("K", "A", "GRM"),
  dispersion_radius = 1,
  dispersion_iters = 2000,
  dispersion_seed = 1,
  K = NULL,
  line_id_map = NULL,
  treatment_effect = c("random", "fixed"),
  prediction_type = c("IID", "GBLUP", "PBLUP", "none"),
  check_as_fixed = TRUE,
  residual_structure = c("IID", "AR1", "AR1xAR1"),
  rho_row = 0,
  rho_col = 0,
  varcomp = list(sigma_e2 = 1, sigma_g2 = 1, sigma_rep2 = 1, sigma_ib2 = 1, sigma_r2 = 1,
    sigma_c2 = 1),
  spatial_engine = c("auto", "sparse", "dense"),
  dense_max_n = 5000,
  eff_trace_samples = 80,
  eff_full_max = 400,
  method = c("RS", "SA", "GA"),
  criterion = c("A", "D", "both", "CDmean"),
  n_restarts = 50L,
  sa_max_iter = 1000L,
  sa_temp_start = 1,
  sa_temp_end = 0.001,
  sa_cooling = c("exponential", "linear"),
  sa_swap_scope = c("global", "within_rep", "within_block"),
  ga_pop_size = 20L,
  ga_n_generations = 50L,
  ga_crossover_rate = 0.7,
  ga_mutation_rate = 0.2,
  ga_elitism = 2L,
  max_failure_rate_rs = 0.5,
  max_failure_rate_sa = 0.2,
  ga_invalid_retries = 3L,
  verbose_opt = TRUE
)

Arguments

check_treatments, check_families, entry_treatments, entry_families, n_reps, n_rows, n_cols, order, serpentine, seed, attempts, warn_and_correct, fix_rows, cluster_source, GRM, A, id_map, cluster_method, cluster_seed, cluster_attempts, n_pcs_use, n_blocks_per_rep, min_block_size, max_block_size, check_placement, check_position_pattern, use_dispersion, dispersion_source, dispersion_radius, dispersion_iters, dispersion_seed, K, line_id_map

Passed directly to alpha_rc_stream(). See that function for full documentation.

treatment_effect, prediction_type, check_as_fixed, residual_structure, rho_row, rho_col, varcomp, spatial_engine, dense_max_n, eff_trace_samples, eff_full_max

Passed directly to evaluate_alpha_efficiency(). See that function for full documentation.

method

Character. Optimisation method:

"RS": Random Restart. Generates n_restarts independent designs and returns the one with the best criterion score. Simple, parallelisable, and robust - every candidate is a fully valid design by construction.
"SA": Simulated Annealing. Iteratively proposes entry permutation swaps, accepting improvements always and degradations with a temperature-governed probability. Repeated across n_restarts independent restarts. Better at escaping local optima than pure random restart.
"GA": Genetic Algorithm. Maintains a population of entry permutations and evolves it using OX1 crossover, swap mutation, and elitism. Most powerful global search but also most computationally intensive.

criterion

Character. Optimality criterion to drive the search:

"A": Minimise A_criterion (mean pairwise contrast variance for fixed effects; mean PEV for random effects). Valid for both treatment_effect = "fixed" and "random". Recommended default.
"D": Minimise D_criterion (geometric mean of contrast covariance eigenvalues). Fixed effects only. Falls back to A_criterion with a warning for random effects.
"both": Minimise the mean of A_criterion and D_criterion. Fixed effects only. Falls back to A_criterion when D is unavailable.
"CDmean": Maximise CDmean (mean coefficient of determination for GEBV prediction; Rincent et al. 2012). Requires treatment_effect = "random" and prediction_type %in% c("IID", "GBLUP", "PBLUP"). Best suited for optimising training population designs for genomic selection.

n_restarts

Positive integer. Number of independent random restarts. Used by all three methods: RS runs n_restarts full designs; SA and GA each restart their search n_restarts times from a new random initial design.

sa_max_iter

Positive integer. Number of swap proposals per SA restart. Each proposal that produces a valid design undergoes Metropolis acceptance; invalid proposals are counted but do not affect the acceptance logic.

sa_temp_start

Positive numeric. Initial SA temperature $T_0$. Higher values allow more degrading moves to be accepted early in the search, improving escape from local optima at the cost of less directed search.

sa_temp_end

Positive numeric. Final SA temperature. Must be strictly less than sa_temp_start. Low values ensure the search converges to a local optimum by the end of each restart.

sa_cooling

Character. SA cooling schedule:

"exponential": Geometric decay: $T_k = T_0 \cdot (T_\text{end}/T_0)^{k/\text{max\_iter}}$. Recommended - spends more iterations at low temperature where refinement occurs.
"linear": Linear decay from sa_temp_start to sa_temp_end.

sa_swap_scope

Character. Scope of entry swap proposals during SA:

"global": Any two non-check entries anywhere in the design may be swapped. Has the largest impact on the criterion because it changes which entries co-occur in blocks across replicates. Recommended for most use cases.
"within_rep": Swaps restricted to entries within the same replicate. Preserves rep-level balance but limits criterion improvement compared with global swaps.
"within_block": Swaps restricted to entries within the same incomplete block. Has minimal criterion impact under IID residuals because the C matrix is invariant to within-block reordering. Only marginally useful under strong spatial correlation (AR1, AR1xAR1).

ga_pop_size

Positive integer $\geq 4$. Number of individuals in the GA population. Larger populations improve coverage of the design space but increase the cost per generation linearly.

ga_n_generations

Positive integer. Number of GA generations per run. Each generation evaluates up to ga_pop_size - ga_elitism new offspring.

ga_crossover_rate

Numeric in $[0, 1]$. Probability of applying OX1 crossover to produce an offspring. When crossover is not applied, the first parent's permutation is passed directly to mutation. Values between 0.6 and 0.9 are typical.

ga_mutation_rate

Numeric in $[0, 1]$. Probability of applying a random swap mutation to an offspring permutation. Provides diversity; values between 0.1 and 0.3 are typical.

ga_elitism

Non-negative integer strictly less than ga_pop_size. Number of best individuals copied unchanged to the next generation. Prevents loss of the best solution found so far. Values of 1 or 2 are usually sufficient.

max_failure_rate_rs

Numeric in $[0, 1]$. Maximum tolerated fraction of RS restarts that may fail construction or integrity checking before the function stops with a diagnostic error. Default 0.5 (stop if more than half of restarts fail). Below this threshold a warning is issued instead. Increase for very constrained field geometries where some failure is expected.

max_failure_rate_sa

Numeric in $[0, 1]$. Maximum tolerated fraction of SA swap proposals per restart that may produce invalid designs before a per-restart warning is issued. Default 0.2. This does not stop the function - it is informational. Invalid swaps are counted but treated as neutral (the current design is retained without invoking the Metropolis criterion).

ga_invalid_retries

Non-negative integer. Number of additional random permutation attempts made when a GA offspring fails integrity checking. Default 3. If all retries also fail, the population slot receives Inf fitness and participates in no future tournament selection.

verbose_opt

Logical. If TRUE, prints per-restart or per-generation progress messages including current score, running best, and failure counts. Set to FALSE for batch or parallel runs.

Value

The return value of alpha_rc_stream() for the best valid design found, augmented with two additional top-level components:

efficiency

The full output of evaluate_alpha_efficiency() for the best design, including all applicable criterion values: A_criterion, D_criterion, A_efficiency, D_efficiency, CDmean, and CD_per_line.

optimization

Named list of optimisation metadata:

method: Character. "RS", "SA", or "GA".
criterion: Character. As supplied.
best_score: Numeric. Best criterion value found, reported in the natural direction for the user. For criterion = "A", "D", or "both": lower is better. For criterion = "CDmean": higher is better (positive value, negation is internal only).
score_history: For RS and GA: numeric vector of length n_restarts or ga_n_generations respectively, tracking the best criterion value at each step. NA entries indicate failed iterations. For SA: a list of n_restarts numeric vectors, each of length sa_max_iter, tracking the current score at each iteration within that restart. For criterion = "CDmean", all values are reported as positive CDmean.
master_seed: Integer. The master random seed used.
n_failed: Integer. Number of failed candidates (RS: failed restarts; SA: failed initial designs across restarts; GA: total invalid offspring requiring retry or fallback to Inf fitness). NA if not tracked for the method.
n_restarts: Integer or NULL. The value of n_restarts. RS only; NULL for SA and GA.
sa_params: Named list or NULL. SA parameters: n_restarts, max_iter, temp_start, temp_end, cooling, swap_scope. SA only.
ga_params: Named list or NULL. GA parameters: pop_size, n_generations, crossover_rate, mutation_rate, elitism, invalid_retries. GA only.

If no valid optimised design is found after all iterations, a warning is issued and the function returns a single freshly constructed valid design via the emergency fallback (see Integrity checking strategy above). If even the emergency fallback fails after 10 attempts, the function stops with an informative error message.

Details

Optimisation target

All methods minimise an internal score (lower = better internally):

`criterion`	Internal score	Direction
`"A"`	`A_criterion`	Lower is better
`"D"`	`D_criterion`	Lower is better
`"both"`	Mean of `A_criterion` and `D_criterion`	Lower is better
`"CDmean"`	Negated CDmean	Higher CDmean is better

For "CDmean", the internal negation is transparent to the user: best_score and score_history in the output are always reported as positive CDmean values (higher = better). Requires treatment_effect = "random".

Entry permutation encoding

For SA and GA, designs are encoded as permutations of seq_along(entry_treatments). A permutation is applied by passing entry_treatments[perm] and entry_families[perm] to alpha_rc_stream(), which then randomises and allocates entries according to its internal logic. This means the permutation influences the treatment-to-block assignment while alpha_rc_stream() handles all within-block arrangement, check placement, and dispersion.

Integrity checking strategy

Three structural constraints are verified for every candidate design:

No entry appears more than once within any replicate.
All expected entry treatments are present in the field book.
All check treatments are present and correctly flagged as checks.

These checks are performed at four points in the pipeline:

Point 1 - after construction: Inside run_pipeline(), immediately after alpha_rc_stream() returns. Failures discard the candidate before scoring.
Point 2 - before storing as best: Inside .update_best(), as a second independent check. Prevents a candidate that passed Point 1 from being stored if its state was corrupted between construction and comparison.
Point 3 - final check before return: After the optimisation loop ends, the stored best is re-checked one final time. If it fails, the emergency fallback is triggered.
Point 4 - emergency fallback: If no valid design is found, up to 10 fresh random constructions are attempted. The first valid one is returned with a warning. If all 10 fail, the function stops with a diagnostic message.

Failure handling by method

RS: failed restarts (construction error or integrity failure) are counted and reported. If the failure rate exceeds max_failure_rate_rs, the function stops with a diagnostic message. Below the threshold a warning is issued.

SA: invalid swap proposals (NULL candidate) are counted per restart but are treated as neutral events - the current design is retained, the temperature counter advances, and the Metropolis acceptance logic is not invoked. This prevents invalid proposals from distorting the acceptance rate. If the invalid rate within a restart exceeds max_failure_rate_sa, a per-restart warning is issued.

GA: when an offspring fails integrity, up to ga_invalid_retries fresh random permutations are tried as replacements. This prevents population slots from being occupied by Inf-fitness placeholders that waste tournament selection capacity. The total count of invalid offspring is reported in a warning and stored in $optimization$n_failed.

Simulated Annealing temperature schedule

"exponential": $T_k = T_0 \cdot \alpha^k$ where $\alpha = (T_\text{end} / T_0)^{1 / \text{max\_iter}}$
"linear": $T_k$ decreases linearly from sa_temp_start to sa_temp_end over sa_max_iter steps

Genetic Algorithm operators

Selection: tournament of size 2 over valid (finite-fitness) individuals only
Crossover: Order Crossover (OX1) - always produces a valid permutation of seq_along(entry_treatments)
Mutation: random swap of two positions in the permutation
Elitism: top ga_elitism individuals copied unchanged to next generation

References

Rincent, R., Laloe, D., Nicolas, S., Altmann, T., Brunel, D., Revilla, P., ... & Moreau, L. (2012). Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: comparison of methods in two diverse groups of maize inbreds (Zea mays L.). Genetics, 192(2), 715-728.

Jones, B., Allen-Moyer, K., & Goos, P. (2021). A-optimal versus D-optimal design of screening experiments. Journal of Quality Technology, 53(4), 369-382.

Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671-680.

Holland, J. H. (1992). Adaptation in Natural and Artificial Systems. MIT Press.

Examples

## -- Random Restart: fixed effects, A-criterion -----------------------------
result_rs <- optimize_alpha_rc(
  check_treatments   = c("CHK1", "CHK2", "CHK3"),
  check_families     = c("CHECK", "CHECK", "CHECK"),
  entry_treatments   = paste0("G", 1:167),
  entry_families     = rep(paste0("F", 1:7), length.out = 167),
  n_reps             = 3,
  n_rows             = 30,
  n_cols             = 20,
  min_block_size     = 19,
  max_block_size     = 20,
  treatment_effect   = "fixed",
  residual_structure = "AR1xAR1",
  rho_row            = 0.10,
  rho_col            = 0.10,
  method             = "RS",
  criterion          = "A",
  n_restarts         = 50
)
#> 
#> [optimize_alpha_rc] method = 'RS' | criterion = 'A' | master seed = 554108441
#> [RS] 50 restarts | criterion = 'A'
#>   [RS] 1/50 | score = 0.746080 | best = 0.746080
#>   [RS] 2/50 | score = 0.746095 | best = 0.746080
#>   [RS] 3/50 | score = 0.744183 | best = 0.744183
#>   [RS] 4/50 | score = 0.744390 | best = 0.744183
#>   [RS] 5/50 | score = 0.747536 | best = 0.744183
#>   [RS] 6/50 | score = 0.745166 | best = 0.744183
#>   [RS] 7/50 | score = 0.748252 | best = 0.744183
#>   [RS] 8/50 | score = 0.746552 | best = 0.744183
#>   [RS] 9/50 | score = 0.745492 | best = 0.744183
#>   [RS] 10/50 | score = 0.746225 | best = 0.744183
#>   [RS] 11/50 | score = 0.743898 | best = 0.743898
#>   [RS] 12/50 | score = 0.745469 | best = 0.743898
#>   [RS] 13/50 | score = 0.746307 | best = 0.743898
#>   [RS] 14/50 | score = 0.745892 | best = 0.743898
#>   [RS] 15/50 | score = 0.745473 | best = 0.743898
#>   [RS] 16/50 | score = 0.746171 | best = 0.743898
#>   [RS] 17/50 | score = 0.748523 | best = 0.743898
#>   [RS] 18/50 | score = 0.746380 | best = 0.743898
#>   [RS] 19/50 | score = 0.746538 | best = 0.743898
#>   [RS] 20/50 | score = 0.747267 | best = 0.743898
#>   [RS] 21/50 | score = 0.746706 | best = 0.743898
#>   [RS] 22/50 | score = 0.746524 | best = 0.743898
#>   [RS] 23/50 | score = 0.747023 | best = 0.743898
#>   [RS] 24/50 | score = 0.745524 | best = 0.743898
#>   [RS] 25/50 | score = 0.747626 | best = 0.743898
#>   [RS] 26/50 | score = 0.744816 | best = 0.743898
#>   [RS] 27/50 | score = 0.745067 | best = 0.743898
#>   [RS] 28/50 | score = 0.746952 | best = 0.743898
#>   [RS] 29/50 | score = 0.745957 | best = 0.743898
#>   [RS] 30/50 | score = 0.745420 | best = 0.743898
#>   [RS] 31/50 | score = 0.744651 | best = 0.743898
#>   [RS] 32/50 | score = 0.747181 | best = 0.743898
#>   [RS] 33/50 | score = 0.745141 | best = 0.743898
#>   [RS] 34/50 | score = 0.747325 | best = 0.743898
#>   [RS] 35/50 | score = 0.744570 | best = 0.743898
#>   [RS] 36/50 | score = 0.746746 | best = 0.743898
#>   [RS] 37/50 | score = 0.744129 | best = 0.743898
#>   [RS] 38/50 | score = 0.747262 | best = 0.743898
#>   [RS] 39/50 | score = 0.746573 | best = 0.743898
#>   [RS] 40/50 | score = 0.743288 | best = 0.743288
#>   [RS] 41/50 | score = 0.743270 | best = 0.743270
#>   [RS] 42/50 | score = 0.747472 | best = 0.743270
#>   [RS] 43/50 | score = 0.747826 | best = 0.743270
#>   [RS] 44/50 | score = 0.745486 | best = 0.743270
#>   [RS] 45/50 | score = 0.745246 | best = 0.743270
#>   [RS] 46/50 | score = 0.745594 | best = 0.743270
#>   [RS] 47/50 | score = 0.743258 | best = 0.743258
#>   [RS] 48/50 | score = 0.746165 | best = 0.743258
#>   [RS] 49/50 | score = 0.744582 | best = 0.743258
#>   [RS] 50/50 | score = 0.745752 | best = 0.743258
#> 
#> [optimize_alpha_rc] Done. Best A = 0.743258 (lower is better)

result_rs$efficiency$A_criterion      # best A-criterion found
#> [1] 0.7432582
result_rs$optimization$best_score     # same value, lower is better
#> [1] 0.7432582
result_rs$optimization$score_history  # per-restart scores
#>  [1] 0.7460799 0.7460952 0.7441834 0.7443899 0.7475358 0.7451657 0.7482523
#>  [8] 0.7465523 0.7454921 0.7462245 0.7438977 0.7454687 0.7463071 0.7458923
#> [15] 0.7454729 0.7461707 0.7485230 0.7463804 0.7465380 0.7472674 0.7467064
#> [22] 0.7465238 0.7470233 0.7455243 0.7476257 0.7448157 0.7450670 0.7469525
#> [29] 0.7459570 0.7454197 0.7446512 0.7471813 0.7451407 0.7473252 0.7445698
#> [36] 0.7467458 0.7441288 0.7472623 0.7465734 0.7432878 0.7432697 0.7474718
#> [43] 0.7478260 0.7454857 0.7452461 0.7455942 0.7432582 0.7461649 0.7445821
#> [50] 0.7457524
result_rs$optimization$n_failed       # number of failed restarts
#> [1] 0

if (FALSE) { # \dontrun{
## -- Simulated Annealing: random effects, CDmean ----------------------------
result_sa <- optimize_alpha_rc(
  check_treatments   = c("CHK1", "CHK2", "CHK3"),
  check_families     = c("CHECK", "CHECK", "CHECK"),
  entry_treatments   = paste0("G", 1:167),
  entry_families     = rep(paste0("F", 1:7), length.out = 167),
  n_reps             = 3,
  n_rows             = 30,
  n_cols             = 20,
  min_block_size     = 19,
  max_block_size     = 20,
  treatment_effect   = "random",
  prediction_type    = "GBLUP",
  K                  = my_kinship_matrix,
  varcomp            = list(
    sigma_g2   = 0.4, sigma_e2   = 0.6,
    sigma_rep2 = 0.1, sigma_ib2  = 0.05,
    sigma_r2   = 0.02, sigma_c2  = 0.02
  ),
  method             = "SA",
  criterion          = "CDmean",
  n_restarts         = 5,
  sa_max_iter        = 500,
  sa_temp_start      = 0.05,
  sa_temp_end        = 0.001,
  sa_cooling         = "exponential",
  sa_swap_scope      = "global",
  max_failure_rate_sa = 0.3
)

result_sa$efficiency$CDmean           # mean prediction reliability [0, 1]
result_sa$optimization$best_score     # positive CDmean, higher is better
result_sa$optimization$n_failed       # failed SA initial designs

## -- Genetic Algorithm: fixed effects, D-criterion --------------------------
result_ga <- optimize_alpha_rc(
  check_treatments   = c("CHK1", "CHK2", "CHK3"),
  check_families     = c("CHECK", "CHECK", "CHECK"),
  entry_treatments   = paste0("G", 1:167),
  entry_families     = rep(paste0("F", 1:7), length.out = 167),
  n_reps             = 3,
  n_rows             = 30,
  n_cols             = 20,
  min_block_size     = 19,
  max_block_size     = 20,
  treatment_effect   = "fixed",
  residual_structure = "AR1xAR1",
  rho_row            = 0.10,
  rho_col            = 0.10,
  method             = "GA",
  criterion          = "D",
  n_restarts         = 3,
  ga_pop_size        = 20,
  ga_n_generations   = 50,
  ga_crossover_rate  = 0.7,
  ga_mutation_rate   = 0.2,
  ga_elitism         = 2,
  ga_invalid_retries = 3
)

result_ga$efficiency$D_criterion      # best D-criterion, lower is better
result_ga$optimization$score_history  # per-generation best D-criterion
result_ga$optimization$n_failed       # total invalid offspring
} # }