Construct a stream-based repeated-check alpha row-column design
Source:R/alpha_rc_stream.R
alpha_rc_stream.Rdalpha_rc_stream() builds a randomised alpha-lattice incomplete block
design for breeding and agronomic experiments on a fixed field grid of
n_rows x n_cols plots. The field is converted to a one-dimensional
planting stream according to order and serpentine, partitioned into
n_reps contiguous replicate segments, and each replicate is further
divided into incomplete blocks. Every incomplete block contains all check
treatments plus a subset of entry treatments. Entry treatments appear
exactly once per replicate. Trailing unused field cells remain NA in the
layout matrix and field book.
Design evaluation has been separated from construction. This function
returns the field book, layout matrix, and design metadata only. To compute
A-, D-, and CDmean optimality criteria, call
evaluate_alpha_efficiency() on the returned field_book. To search for
an optimised design using Random Restart, Simulated Annealing, or a Genetic
Algorithm, call optimize_alpha_rc().
Usage
alpha_rc_stream(
check_treatments,
check_families,
entry_treatments,
entry_families,
n_reps,
n_rows,
n_cols,
order = "column",
serpentine = FALSE,
seed = NULL,
attempts = 5000,
warn_and_correct = TRUE,
fix_rows = TRUE,
cluster_source = c("Family", "GRM", "A"),
GRM = NULL,
A = NULL,
id_map = NULL,
cluster_method = c("kmeans", "hclust"),
cluster_seed = 1,
cluster_attempts = 25,
n_pcs_use = Inf,
n_blocks_per_rep = NULL,
min_block_size = NULL,
max_block_size = NULL,
check_placement = c("systematic", "random"),
check_position_pattern = c("spread", "corners_first"),
use_dispersion = FALSE,
dispersion_source = c("K", "A", "GRM"),
dispersion_radius = 1,
dispersion_iters = 2000,
dispersion_seed = 1,
K = NULL,
line_id_map = NULL,
verbose = TRUE
)Arguments
- check_treatments
Character vector of check treatment identifiers. Checks appear in every incomplete block of every replicate. Must not overlap with
entry_treatmentsand must not contain duplicates.- check_families
Character vector of the same length as
check_treatments. Family labels for checks, used for adjacency scoring. Always stored in theFamilycolumn;Gclusteris alwaysNAfor check plots.- entry_treatments
Character vector of non-check treatment identifiers. Each entry appears exactly once per replicate. Must not contain duplicates.
- entry_families
Character vector of the same length as
entry_treatments. Family labels for entries. Used for adjacency scoring whencluster_source = "Family"and to set the number of clusters whencluster_source %in% c("GRM", "A").- n_reps
Positive integer. Number of contiguous replicate segments.
- n_rows
Positive integer. Number of field rows.
- n_cols
Positive integer. Number of field columns.
- order
Character. Stream traversal direction:
"row"fills row by row;"column"fills column by column.- serpentine
Logical. If
TRUE, alternate rows (whenorder = "row") or alternate columns (whenorder = "column") traverse in reverse direction.- seed
Optional integer. If
NULL, a seed is drawn from1:.Machine$integer.maxand returned asseed_used. Controls all randomised steps including entry randomisation, within-block shuffling, check placement, and dispersion optimisation.- attempts
Positive integer. Number of within-block shuffling attempts to minimise adjacent same-group pairs. Exits early when score reaches zero.
- warn_and_correct
Logical. Retained for interface compatibility. Field capacity violations always raise an error.
- fix_rows
Logical. Retained for interface compatibility. Field geometry is always fixed.
- cluster_source
Character. Grouping source for adjacency scoring.
"Family"usesentry_familiesdirectly."GRM"and"A"derive clusters from PCA of the respective relationship matrix.- GRM
Optional square numeric matrix with rownames and colnames equal to line IDs. Required when
cluster_source = "GRM"or whenuse_dispersion = TRUEanddispersion_source = "GRM".- A
Optional square numeric matrix. Required when
cluster_source = "A"ordispersion_source = "A".- id_map
Optional data frame with columns
TreatmentandLineID. Required when treatment labels do not match relationship matrix rownames.- cluster_method
Character. Clustering algorithm:
"kmeans"usesstats::kmeans()withcluster_attemptsrandom starts;"hclust"uses Ward's D2 linkage.- cluster_seed
Integer. Seed for k-means initialisation, run in an isolated RNG scope.
- cluster_attempts
Positive integer. Number of k-means random restarts. Ignored when
cluster_method = "hclust".- n_pcs_use
Positive number or
Inf. Number of leading PCs for matrix-based clustering. Actual number used ismin(n_pcs_use, n_positive_eigenvalues - 1).- n_blocks_per_rep
Optional positive integer. If supplied, the exact number of incomplete blocks per replicate. Validated against the feasible range derived from
min_block_size,max_block_size, and field capacity. IfNULL, the block count is set automatically to \(b_\text{max}\).- min_block_size
Optional positive integer \(\geq c\). Minimum total block size (checks + entries). Sets an upper bound on \(b\).
- max_block_size
Optional positive integer \(\geq c\). Maximum total block size (checks + entries). Sets a lower bound on \(b\).
- check_placement
Character. Check position rule within blocks.
"systematic"distributes checks evenly;"random"draws positions uniformly without replacement.- check_position_pattern
Character. Retained for interface compatibility.
- use_dispersion
Logical. If
TRUE, apply a post-layout swap-based local dispersion optimisation to reduce pairwise relatedness among neighbouring non-check plots.- dispersion_source
Character. Matrix used for dispersion scoring:
"K","A", or"GRM". The corresponding argument must be non-NULL.- dispersion_radius
Positive integer. Chebyshev distance radius defining the neighbourhood for dispersion scoring.
- dispersion_iters
Non-negative integer. Number of random swap proposals for dispersion optimisation.
- dispersion_seed
Integer. Seed for the dispersion step, run in an isolated RNG scope. Defaults to
seed_usedwhenNULL.- K
Optional square numeric matrix. Used for dispersion scoring when
dispersion_source = "K".- line_id_map
Optional data frame with columns
TreatmentandLineID. Required when treatment labels do not matchrownames(K)or the dispersion matrix.- verbose
Logical. If
TRUE, prints a one-line summary of field usage, trailingNAplots, replicate sizes, block count and origin, block sizes in replicate 1, and active block-size bounds.
Value
A named list with four components:
layout_matrixCharacter matrix of dimension
n_rows x n_cols. Used cells contain treatment IDs; unused trailing cells areNA.field_bookData frame with one row per field plot. Columns:
Plot(stream position),Row,Column,Rep(NAfor unused),IBlock(global block index),BlockInRep(block index within rep),Treatment(NAfor unused),Family,Gcluster(NAwhencluster_source = "Family"or for check plots),Check(logical).design_infoNamed list:
n_rows,n_cols,total_plots,n_reps,n_checks,n_entries,n_blocks_per_rep(resolved),n_blocks_per_rep_user,min_block_size,max_block_size,min_entry_slots_per_block,max_entry_slots_per_block,rep_sizes,total_used_plots,trailing_na_plots,block_plan,block_meta,order,serpentine.seed_usedInteger. The random seed used.
Details
Design structure
Let:
\(r\) = number of replicates (
n_reps)\(c\) = number of checks (
length(check_treatments))\(v\) = number of entries (
length(entry_treatments))\(b\) = number of incomplete blocks per replicate
Each replicate contains all \(v\) entries exactly once and all \(c\) checks repeated once in each of the \(b\) incomplete blocks. Entries are split across blocks as evenly as possible - block entry counts differ by at most one. The number of used plots per replicate is:
$$\text{plots per rep} = v + b \times c$$
The total used plots in the design is:
$$\text{total used} = r \times (v + b \times c)$$
If the fixed field contains more cells than this total, unused trailing
cells remain NA. The function stops if the required plots exceed field
capacity.
Block-size parameterisation
Block-size constraints are expressed in terms of total block size (checks + entries) because in a repeated-check design the whole block is the operational unit. The mapping is:
$$\text{min\_entry\_slots} = \text{min\_block\_size} - c$$ $$\text{max\_entry\_slots} = \text{max\_block\_size} - c$$
For example, with \(c = 3\) checks, min_block_size = 19 and
max_block_size = 20 means each block holds between 16 and 17 entries plus
3 check positions. Both bounds must be \(\geq c\).
Block-count determination
After translating total-block-size limits into entry-slot limits, the number of incomplete blocks per replicate \(b\) must satisfy a feasible range \([b_\text{min},\, b_\text{max}]\):
- Lower bound from
max_block_size \(b \geq \lceil v / (\text{max\_block\_size} - c)\rceil\)
- Upper bound from
min_block_size \(b \leq \lfloor v / (\text{min\_block\_size} - c)\rfloor\)
- Upper bound from field capacity
\(b \leq \lfloor (\text{total\_plots}/r - v) / c \rfloor\)
The function stops with an informative error if
\(b_\text{min} > b_\text{max}\). At least one of n_blocks_per_rep,
min_block_size, or max_block_size must be supplied.
User-fixed mode (n_blocks_per_rep supplied): the value is validated
against the feasible range and all block-size constraints.
Automatic mode (n_blocks_per_rep = NULL): \(b\) is set to
\(b_\text{max}\) - the largest feasible number of blocks, producing the
smallest feasible blocks, which generally gives the best local control.
Field stream construction
The field is mapped to a one-dimensional stream by iterating over rows
(order = "row") or columns (order = "column"). When serpentine = TRUE,
alternate rows or columns reverse direction, producing a boustrophedon
planting path. Replicate segments occupy contiguous portions of the stream
and incomplete blocks occupy contiguous sub-segments within each replicate.
Within-block arrangement
Entries are randomised independently within each replicate, then split
across blocks. Checks are inserted at positions determined by
check_placement:
"systematic": positions computed asround(seq(1, block_size, length.out = n_checks + 2))[2:(n_checks + 1)], distributing checks as evenly as possible."random": positions drawn uniformly without replacement.
Within-block order is optimised over attempts random shuffles to minimise
the count of adjacent same-group treatment pairs. The search exits early
when the adjacency score reaches zero.
Grouping and adjacency scoring
The grouping label used for adjacency scoring is determined by
cluster_source:
"Family": usesentry_familiesandcheck_familiesdirectly.GclusterisNAfor all plots."GRM"or"A": derives clusters from the leading principal components of the relationship matrix. The number of clusters equalslength(unique(entry_families)). Cluster labels are prefixed"G"(GRM) or"A"(A matrix). Check treatments always usecheck_familiesand always haveGcluster = NA.
Dispersion optimisation
When use_dispersion = TRUE, a local swap search is applied after layout
construction. At each of dispersion_iters iterations, two non-check plots
are selected at random; the swap is accepted if it reduces the total
neighbourhood relatedness score - the sum of relationship matrix values
between all non-check neighbour pairs within Chebyshev distance
dispersion_radius. The matrix used is selected by dispersion_source.
See also
evaluate_alpha_efficiency() to compute A, D, and CDmean
optimality criteria on the returned field_book.
optimize_alpha_rc() to search for a criterion-optimal design using
Random Restart, Simulated Annealing, or a Genetic Algorithm.
Examples
## Basic construction: 167 entries, 3 checks, 3 reps
design <- alpha_rc_stream(
check_treatments = c("CHK1", "CHK2", "CHK3"),
check_families = c("CHECK", "CHECK", "CHECK"),
entry_treatments = paste0("G", 1:167),
entry_families = rep(paste0("F", 1:7), length.out = 167),
n_reps = 3,
n_rows = 30,
n_cols = 20,
min_block_size = 19,
max_block_size = 20
)
#> Fixed field = 30 x 20; used plots = 591; trailing NA plots = 9; replicate used sizes = {197, 197, 197}; blocks/rep = 10 (derived); block sizes in rep 1 = {20, 20, 20, 20, 20, 20, 20, 19, 19, 19}; min block size = 19; max block size = 20
dim(design$layout_matrix) # 30 x 20
#> [1] 30 20
head(design$field_book)
#> Plot Row Column Rep IBlock BlockInRep Treatment Family Gcluster Check
#> 1 1 1 1 1 1 1 G67 F4 <NA> FALSE
#> 2 2 2 1 1 1 1 G17 F3 <NA> FALSE
#> 3 3 3 1 1 1 1 G127 F1 <NA> FALSE
#> 4 4 4 1 1 1 1 G143 F3 <NA> FALSE
#> 5 5 5 1 1 1 1 G61 F5 <NA> FALSE
#> 6 6 6 1 1 1 1 CHK1 CHECK <NA> TRUE
design$design_info$n_blocks_per_rep
#> [1] 10
design$design_info$trailing_na_plots
#> [1] 9
## Evaluate separately
eff <- evaluate_alpha_efficiency(
field_book = design$field_book,
n_rows = 30,
n_cols = 20,
check_treatments = c("CHK1", "CHK2", "CHK3"),
treatment_effect = "fixed",
residual_structure = "AR1xAR1",
rho_row = 0.10, rho_col = 0.10
)
eff$A_criterion
#> [1] 0.7472387
eff$D_criterion
#> [1] 0.3534374