2.5.1. Linear Random Effects Models (LRE Models)

Cathy C. Westhues; Gregory S. Mahone; Sofia da Silva; Patrick Thorwarth; Malthe Schmidt; Jan-Christoph Richter; Henner Simianer; Timothy M. Beissinger

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.5.1. Linear Random Effects Models (LRE Models)

CW Cathy C. Westhues

GM Gregory S. Mahone

SS Sofia da Silva

PT Patrick Thorwarth

MS Malthe Schmidt

JR Jan-Christoph Richter

HS Henner Simianer

TB Timothy M. Beissinger

This method is extracted from research article: Front Plant Sci, Nov 2021

Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks

DOI: 10.3389/fpls.2021.699589

Request a Protocol

Ask a question

Favorite

In multi-environment trial analysis and plant breeding experiments, linear random effects models, abbreviated to LRE models thereafter, are often used as genomic prediction models and were compared in this study with machine learning techniques, according to the models outlined in Jarquín et al. (2014). In particular, GxE can be modeled with a covariance function equal to the product of two random linear functions of markers and of environmental covariates, which is equivalent to a reaction norm model (Jarquín et al., 2014). An environment always refers to a Site x Year combination.

Main effects models

(1) Model G + E: Marker + Environment Main Effects (baseline model)

The response variable is modeled as the sum of an overall mean (μ), plus random deviations due to the environment E_i and to the genotypic random effect of the jth hybrid genotype g_j based on marker covariates (G-BLUP component), plus an error term ε_ij:

where $E_{i} \overset{I I D}{~} N (0, σ_{E}^{2})$ , $g \overset{I I D}{~} N (0, G σ_{g}^{2})$ and $ε_{i j} \overset{I I D}{~} N (0, σ_{ε}^{2})$ , and N(.,.) denotes a normally distributed random variable, IID stands for independent and identically distributed, and $σ_{E}^{2}$ , $σ_{g}^{2}$ are the corresponding environmental and genomic variances, respectively.

g_j corresponds to a regression on marker covariates of the form $g_{j} = \sum_{m = 1}^{p} x_{j m} b_{m}$ , linear combination of p markers and their respective marker effects. Marker effects were regarded as IID draws from normal distributions of the form $b_{m} \overset{I I D}{~} N (0, σ_{b}^{2})$ , m = 1,...,p. The vector g = Xb follows a multivariate normal density with null mean and covariance-matrix $C o v (g) = G σ_{g}^{2}$ , where $G = \frac{X X^{'}}{p}$ is the genomic relationship matrix, X representing the centered and standardized genotype matrix and p is the total number of markers.

(2) Model G + S: Marker + Site Main Effects

The present model allows to gain information from a site evaluated over several years, as it includes the site effect:

Here y_kj corresponds to the phenotypic response of the jth genotype in the kth site with $S_{k} \overset{I I D}{~} N (0, σ_{S}^{2})$ , k = 1,...,K.

(3) Model G+E+W: Marker + Ennvironment + Environmental Covariates Main Effects

This model incorporates additionally the main effect of the environmental covariates (including the longitude and latitude coordinates). We can model the environmental effects by a random regression on the ECs (W), that represents the environmental conditions experienced by each hybrid in each environment: $w_{i j} = \sum_{q = 1}^{Q} W_{i j q} γ_{q}$ , where W_ijq is the value of the qth EC evaluated in the ijth environment x hybrid combination, γ_q is the main effect of the corresponding EC, and Q is the total number of ECs. We considered the effects of the ECs as IID draws from normal densities, i.e., $γ_{q} ~ N (0, σ_{γ}^{2})$ . Consequently, the vector w = Wγ follows a multivariate normal distribution with null mean and covariance matrix $Ω σ_{w}^{2}$ , where Ω ∝ WW′, and the matrix W, which is centered and standardized, contains the values of the ECs. The model becomes then:

with $w ~ N (0, Ω σ_{w}^{2})$ .

In this model, as explained in Jarquín et al. (2014), environmental effects are subdivided in two components, one that originates from the regression on numeric environmental variables, and one due to deviations from the Year-Site combination effect which cannot be accounted for by the ECs. Indeed, the environmental variables might not be able to fully explain the differences across environments. The modeling of the covariance matrices Ω and G allows to borrow information between environments and between hybrid genotypes, respectively.

Models with interaction

(4) Model G+E+GxE: main effects G+E with Genomic x Environment Interaction

The model G+E was extended by including the interaction term between environments and markers (GxE):

with $g E ~ N (0, [Z_{g} G Z_{g}^{'}] ° [Z_{E} Z_{E}^{'}] σ_{g E}^{2}), ε_{i j} \overset{I I D}{~} N (0, σ_{ε}^{2})$ , where Z_g and Z_E are the design matrices that connect the phenotype entries with hybrid genotypes and with environments, respectively; $σ_{g E}^{2}$ is the variance component of the gE_ij interaction term; and ° denotes the Hadamard product between two matrices.

(5) Model G+S+GxS: main effects G+S with Genomic x Site Interaction

Similar to the previous model, this model extends model G+S by including the interaction term between sites and markers (GxS):

where $g S ~ N (0, [Z_{g} G Z_{g}^{'}] ° [Z_{S} Z_{S}^{'}] σ_{g S}^{2}), ε_{k j} \overset{I I D}{~} N (0, σ_{ε}^{2})$ , where Z_S and $σ_{g S}^{2}$ are the design matrix for sites and the associated variance component for this interaction, respectively.

(6) Model G+E+S+Y+GxS+GxY+GxE: main effects G+E+S+Y with Genomic x Environment Interaction, Genomic x Site Interaction and Genomic x Year Interaction

This model corresponds to the most complete model using only basic GxE information (year and site information) about environments:

where $g Y ~ N (0, [Z_{g} G Z_{g}^{'}] ° [Z_{Y} Z_{Y}^{'}] σ_{g Y}^{2}), ε_{k j} \overset{I I D}{~} N (0, σ_{ε}^{2})$ , where Z_Y and $σ_{g Y}^{2}$ are the design matrix for years and the associated variance component for this interaction, respectively.

(7) Model G+E+W+GxW: main effects G+E+W with interactions between markers and environmental covariates

The model G+E+W was extended by adding the interaction between genomic markers and environmental covariates. Jarquín et al. (2014) demonstrated that this interaction term induced by the reaction-norm model can be described by a covariance structure which corresponds, under standard assumptions, to the Hadamard product of two covariance structures: one characterizing the relationships between lines based on markers information (e.g., G), and one describing the environmental resemblance based on ECs (e.g., Ω). The vector of random effects, denoted gw represents the interaction terms between markers and ECs, is assumed to follow a multivariate normal distribution with null mean and covariance structure $[Z_{g} G Z_{g}^{'}] ° Ω$ . The model can be expressed as follows:

with $g w ~ N (0, [Z_{g} G Z_{g}^{'}] ° Ω σ_{g w}^{2}$ ).

(8) Model G+E+W+GxW+GxE: main effects G+E+W with Genomic x Environment Interaction and Genomic x Environmental Covariates Interaction

The interaction term gE_ij is incorporated in this model, because some GxE might not be completely captured by the interaction term gw_ij, and the model becomes:

Main and interactions effects included in the different models described above are summarized in Supplementary Table 5. Models using W, i.e., the matrix of environmental covariates, were tested with and without longitude and latitude data included. Additional combinations of main effects and interactions not detailed here were also evaluated and results are presented as Supplementary Material. These models were implemented in a Bayesian framework using the R package BGLR (Pérez and de Los Campos, 2014), for which the MCMC algorithm was run for 42,000 iterations and the first 2000 cycles were removed as burn-in with thinning equal to 5.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol