2.3. From GBLUP to ssGBLUP

DL Daniela Lourenco
AL Andres Legarra
ST Shogo Tsuruta
YM Yutaka Masuda
IA Ignacio Aguilar
IM Ignacy Misztal
request Request a Protocol
ask Ask a question
Favorite

Understanding the difference between GBLUP and ssGBLUP is a crucial step. Because there is still a lot of confusion, an explanation about GBLUP is provided.

The GBLUP is equivalent to SNP-BLUP, but in GBLUP genomic breeding values (u=Za) are estimated, instead of SNP effects (a) in SNP-BLUP. It also assumes that SNPs have a priori a normal distribution; the majority of SNPs have a small effect, and very few have moderate to large effect. Using a simple animal model as shown in (14) and (15):

where W is the incidence matrix for animal effect (u), X is the incidence matrix for fixed effects (b), σe2 is the residual variance, and u~N(0,Gσu2).

Therefore, GBLUP is a BLUP where A is replaced by the genomic relationship matrix. The effectiveness of GBLUP will depend on the ability of G to approach the realized genetic relationships. In addition, performing a quality control of genomic data before constructing G avoids biases and losses of accuracy.

If we assume that not all the genetic variance is explained by markers, an extra polygenic effect can be included to explain the remaining variance. In this case, the model in (14) becomes:

where g is a vector of residual polygenic effect that is not captured by the SNPs. Assuming that α is the proportion of variance explained by SNPs, the total additive genetic effect (ug) becomes

Therefore,

In real situations, it is assumed that α varies from 0.8 to 0.95. Note that this is also going to make G invertible [17]. When (1α) is used strictly to make G (semi-) positive definite, it is called a blending parameter.

Although GBLUP has been widely used in animal and plant breeding applications, its main problem is that only genotyped animals are in the model. As only a fraction of animals is genotyped, GBLUP may have less phenotypic and pedigree information than BLUP. Because of that, some extra steps are needed to combine genomic and pedigree information. When using GBLUP, SNP-BLUP or Bayesian models, the genomic evaluation method is called multistep. The steps involved in multistep are: (1) Estimation of EBV using traditional BLUP (i.e., all available information); (2) de-regression of EBV, which condenses information from phenotypes (e.g., daughter yield deviation in dairy cattle); (3) estimation of SNP effects using GBLUP or other models; (4) prediction of Za, which is also known as direct genomic values (DGVs); (5) blending DGVs with average of parent’s EBV, which is known as parent average (PA), with published EBV, or with PTA. The main issue on having an evaluation with several steps is that some errors and biases can be introduced during those steps [10], and that BLUP will not be robust to genomic selection decisions [13].

The idea for ssGBLUP came from the fact that usually only a small portion of the animals, in a given population, is genotyped. In this way, the best approach to avoid several steps would be to combine pedigree and genomic relationships and use this matrix as the covariance structure in the MME. Legarra et al. [15] stated that genomic evaluations would be simpler if genomic relationships were available for all animals in the model. Then, their idea was to look at A as a priori relationship and to G as observed relationships; however, G is observed only for some individuals, and those individuals have A22 as a priori relationship. Based on that, it was shown that the genomic information could be extended to non-genotyped animal based on the joint distribution of breeding values of non-genotyped (u1) and genotyped (u2) animals [15,17]:

If we consider that

where subscripts 1 and 2 represent non-genotyped and genotyped animals, respectively. The conditional distribution of breeding values for non-genotyped and genotyped animals is

If u2 in A12A221u2 is replaced by a vector of observed gene content, the formula can be used to estimated gene content for non-genotyped animals based on observed gene content for genotyped animals [39]. It implies that by using A12A221u2 the genomic information can be implicitly imputed from genotyped animals to non-genotyped based on pedigree relationships. The variance in the distribution (A11A12A221A21) is the prediction error term. Therefore, because the animals with subscript 1 have no genotypes, the variance depends on their pedigree relationships with genotyped animals. In this way, variances and covariances are:

Rearranging:

Therefore,

Finally, the matrix that contains the joint relationships of genotyped and non-genotyped animals is given by:

which can be simplified to:

This H matrix is; therefore, a relationship matrix constructed with SNP markers and pedigree, where the SNP information is projected to the individuals that are not genotyped. Some of its properties include being always semi-positive definite and being positive definite and invertible if G is invertible. Although H is very complicated, its inverse (H1) is quite simple [16,17]:

As both A1 and G1 capture relationships, A221 should be subtracted to avoid double-counting of pedigree information for genotyped animals.

Assuming the following animal model:

The MME for ssGBLUP becomes:

The distribution of u becomes:

Therefore, the only difference between BLUP and ssGBLUP is that A1 is replaced by H1. Subsequently, all tools based on BLUP mixed model equations, as the restricted maximum likelihood (REML [40]), can be easily converted to single-step. In a nutshell, if all animals are genotyped, ssGBLUP becomes GBLUP, but if no animals are genotyped, ssGBLUP becomes BLUP.

Advantages of ssGBLUP include simplicity of use, simultaneous fit of genomic information and estimation of fixed effects [10], relatively higher accuracy than multistep methods [41,42,43,44,45], potential to account for pre-selection bias as all pedigree, phenotypic, and genomic information can be included in the model [12,13], and can be easily extended to any model.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A