In multi-environment trial analysis and plant breeding experiments, linear random effects models, abbreviated to LRE models thereafter, are often used as genomic prediction models and were compared in this study with machine learning techniques, according to the models outlined in Jarquín et al. (2014). In particular, GxE can be modeled with a covariance function equal to the product of two random linear functions of markers and of environmental covariates, which is equivalent to a reaction norm model (Jarquín et al., 2014). An environment always refers to a Site x Year combination.
Main effects models
(1) Model G + E: Marker + Environment Main Effects (baseline model)
The response variable is modeled as the sum of an overall mean (μ), plus random deviations due to the environment Ei and to the genotypic random effect of the jth hybrid genotype gj based on marker covariates (G-BLUP component), plus an error term εij:
where , and , and N(.,.) denotes a normally distributed random variable, IID stands for independent and identically distributed, and , are the corresponding environmental and genomic variances, respectively.
gj corresponds to a regression on marker covariates of the form , linear combination of p markers and their respective marker effects. Marker effects were regarded as IID draws from normal distributions of the form , m = 1,...,p. The vector g = Xb follows a multivariate normal density with null mean and covariance-matrix , where is the genomic relationship matrix, X representing the centered and standardized genotype matrix and p is the total number of markers.
(2) Model G + S: Marker + Site Main Effects
The present model allows to gain information from a site evaluated over several years, as it includes the site effect:
Here ykj corresponds to the phenotypic response of the jth genotype in the kth site with , k = 1,...,K.
(3) Model G+E+W: Marker + Ennvironment + Environmental Covariates Main Effects
This model incorporates additionally the main effect of the environmental covariates (including the longitude and latitude coordinates). We can model the environmental effects by a random regression on the ECs (W), that represents the environmental conditions experienced by each hybrid in each environment: , where Wijq is the value of the qth EC evaluated in the ijth environment x hybrid combination, γq is the main effect of the corresponding EC, and Q is the total number of ECs. We considered the effects of the ECs as IID draws from normal densities, i.e., . Consequently, the vector w = Wγ follows a multivariate normal distribution with null mean and covariance matrix , where Ω ∝ WW′, and the matrix W, which is centered and standardized, contains the values of the ECs. The model becomes then:
with .
In this model, as explained in Jarquín et al. (2014), environmental effects are subdivided in two components, one that originates from the regression on numeric environmental variables, and one due to deviations from the Year-Site combination effect which cannot be accounted for by the ECs. Indeed, the environmental variables might not be able to fully explain the differences across environments. The modeling of the covariance matrices Ω and G allows to borrow information between environments and between hybrid genotypes, respectively.
Models with interaction
(4) Model G+E+GxE: main effects G+E with Genomic x Environment Interaction
The model G+E was extended by including the interaction term between environments and markers (GxE):
with , where Zg and ZE are the design matrices that connect the phenotype entries with hybrid genotypes and with environments, respectively; is the variance component of the gEij interaction term; and ° denotes the Hadamard product between two matrices.
(5) Model G+S+GxS: main effects G+S with Genomic x Site Interaction
Similar to the previous model, this model extends model G+S by including the interaction term between sites and markers (GxS):
where , where ZS and are the design matrix for sites and the associated variance component for this interaction, respectively.
(6) Model G+E+S+Y+GxS+GxY+GxE: main effects G+E+S+Y with Genomic x Environment Interaction, Genomic x Site Interaction and Genomic x Year Interaction
This model corresponds to the most complete model using only basic GxE information (year and site information) about environments:
where , where ZY and are the design matrix for years and the associated variance component for this interaction, respectively.
(7) Model G+E+W+GxW: main effects G+E+W with interactions between markers and environmental covariates
The model G+E+W was extended by adding the interaction between genomic markers and environmental covariates. Jarquín et al. (2014) demonstrated that this interaction term induced by the reaction-norm model can be described by a covariance structure which corresponds, under standard assumptions, to the Hadamard product of two covariance structures: one characterizing the relationships between lines based on markers information (e.g., G), and one describing the environmental resemblance based on ECs (e.g., Ω). The vector of random effects, denoted gw represents the interaction terms between markers and ECs, is assumed to follow a multivariate normal distribution with null mean and covariance structure . The model can be expressed as follows:
with ).
(8) Model G+E+W+GxW+GxE: main effects G+E+W with Genomic x Environment Interaction and Genomic x Environmental Covariates Interaction
The interaction term gEij is incorporated in this model, because some GxE might not be completely captured by the interaction term gwij, and the model becomes:
Main and interactions effects included in the different models described above are summarized in Supplementary Table 5. Models using W, i.e., the matrix of environmental covariates, were tested with and without longitude and latitude data included. Additional combinations of main effects and interactions not detailed here were also evaluated and results are presented as Supplementary Material. These models were implemented in a Bayesian framework using the R package BGLR (Pérez and de Los Campos, 2014), for which the MCMC algorithm was run for 42,000 iterations and the first 2000 cycles were removed as burn-in with thinning equal to 5.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.