Simulated Data

JV Jérémie Vandenplas
MC Mario P L Calus
JN Jan ten Napel
ask Ask a question
Favorite

The assessment of the quality of the genomic predictions from a sparse ssGBLUP in crossbreeding schemes was achieved by simulating a 3-way crossbreeding program with random selection (Fig. 1). Simulations of historic, purebred, and crossbred recent populations were performed using the QMSim software (Sargolzaei and Schenkel, 2009). For the historic population, 70 discrete random mating generations (i.e., generations 1 to 70) with a constant size of 18,840 individuals with equal number of individuals from each sex were simulated, followed by 10 generations (i.e., generations 71 to 80) in which the effective population size was gradually reduced to 390 individuals. The next 20 generations (i.e., generations 81 to 100) were simulated to gradually expand the population size to 18,840. The last generation (i.e., generation 100) included 90 males and 18,750 females. Matings for all generations were based on the random union of gametes, which were randomly sampled from the pools of male and female gametes. To simulate the 3 breed populations (hereafter referred to as breeds A, B, and C), 3 random samples were drawn from the generation 100 of the historic population, each including 30 males and 6,250 females. Subsequently, within each breed, 100 generations (i.e., generations 101 to 200) of random mating were simulated before starting the 3-way crossbreeding program (Fig. 1). In each of the simulated 100 generations of random mating, each female had 1 male and 1 female offspring.

Schematic representation of the simulation. The crossbreeding program started at generation 200 (generation numbers in bold). The number of males (M) and females (F) per generation and per breed (A, B, and C), or per cross [BC and A(BC)], is reported within brackets. Blue arrows denote the sires and dams of the next generation; red arrows denote the dams of the next generation; and green arrows denote the sires of the next generation.

In the second step, a 3-way crossbreeding program was simulated (Fig. 1). Purebred (i.e., A, B, and C) animals that were used as founders of the pedigree (i.e., the first generation of the pedigree) were from generations 200. For each breed, A, B, and C, the next 9 discrete generations (i.e., generations 201 to 209) of purebred animals were simulated by means of random selection and matings while maintaining a constant size of 30 males and 6,250 females. For mimicking a 3-way crossbreeding program, from the generation 205 until the generation 208, B and C purebred animals were randomly crossed to produce 4 generations (i.e., generations 206 to 209) of F1 animals, that is, 30 BC crossbred males and 6,250 BC crossbred females. These BC crossbred animals were then randomly mated to males from breed A to produce 4 generations (i.e., generations 206 to 209) of F2 animals, called A(BC) crossbred animals. For each generation, 6,280 A(BC) crossbred animals were simulated (Fig. 1). Purebred animals that were used as parents of crossbred animals could also be parents of purebred animals in the next generation. A total of 5 replicates were simulated using the QMSim software.

The genome was simulated using the QMSim software, simultaneously with the simulation of the historic, purebred, and crossbred recent populations. The genome consisted of 18 chromosomes designed to resemble the Sus Scrofa genome with a SNP density that was comparable to that of a 60k SNP chip. The SNP positions were randomized across the genome, and a recurrent mutation rate of 2.5 × 10-5, as well as 1 mean crossover per 1 Morgan, was assumed. All SNPs that segregated in the last historical generation (i.e., generation 100) and with a minor allele frequency (MAF) higher than or equal to 0.05 were selected and used to simulate the genotypes of the purebred and crossbred animals. In addition to the SNPs, 4,500 QTL were simulated, and their positions were also randomized across the genome. Mutation rate and MAF of the QTL were the same as the ones for the simulated SNPs.

For all purebred and crossbred animals, phenotypes for the breed composition to which they belonged were simulated under additive gene action using a custom Fortran program. This resulted in 5 traits: 1 trait for each of the purebred performances A, B and C, and 1 trait for each of the crossbred performances BC and A(BC). Genetic correlations between traits were randomly sampled in the range of 0.2 to 0.8 from a uniform distribution. Simulated genetic correlations between purebred and crossbred traits were in the lowest range of reported values in the literature as reviewed by Wientjes and Calus (2017) (Table 1). Heritabilities hi2 were randomly sampled in the range of 0.2 to 0.4 from a uniform distribution. Residual covariances were set to zero, as they would be in practice, because each animal has a phenotype for 1 of the 5 traits only. The same genetic correlations and heritabilities were used in all replicates and are reported in Table 1.

Heritabilities (diagonal) and genetic correlations (off-diagonal) among the 5 simulated traits

For each animal and for each of the 5 traits, a true breeding value (TBV) was simulated by summing a polygenic effect and the multiplication of the simulated allele substitution effects with the genotypes of the 4,500 QTL coded as 0, 1, and 2. This genotype multiplication allowed different genetic levels across breeds for the same trait because QTL allele frequencies differ across breeds. For each trait, the polygenic effect of each individual was equal to the sum of the average of polygenic effects of the parent and a Mendelian sampling term. The Mendelian sampling terms for the 5 traits were sampled from a multinormal distribution with means of 0 and variances equal to the Mendelian sampling variances (Mrode, 2005). Correlations between the simulated Mendelian sampling terms were assumed to be equal to the genetic correlations. The variance of the polygenic effect of each ith trait was assumed to be equal to 5% of the total additive genetic variance Ai2)

The allele substitution effects of QTLs were sampled from a multinormal distribution with means of 0 and variances of 1. The correlations between allele substitution effects of the QTL underlying the 5 traits were equal to the genetic correlations. For each trait, the genetic variance explained by all QTLs was computed as the sum of the variances across all QTLs, assuming no correlation between the QTLs. The simulated additive genetic variance of each jth QTL was calculated as σgj2=2pj(1pj)aj2 where pj is the allele frequency and aj is the allele substitution effect of jth QTL. For each trait, the allele substitution effects were rescaled to obtain an additive genetic variance explained by the QTLs g2) equal to 1. The part of the total additive genetic variance explained by the QTLs was assumed to be equal to 95% for each ith trait. Finally, the phenotypes for each trait for each animal were generated by summing the TBV and a residual error sampled from a normal distribution with a mean 0 and a variance equal to (1hi21)×σAi2

For all the analyses, the pedigree included all the animals simulated for the creation of the 3-way crossbreeding program. The phenotype dataset included 126,000 records. Among all records, 100,000 records were associated with purebred (i.e., A, B, and C) animals randomly sampled among all purebred animals from generations 204 until 208. A total of 16,000 records were associated with A(BC) crossbred animals randomly sampled among all A(BC) crossbred animals from generations 206 until 209. Finally, 10,000 records were associated with BC crossbred dams. Average numbers of purebred and crossbred animals per generation with a phenotype are given in Supplementary Table S1.

The genotype dataset included 89,000 genotypes. This included all 26,000 phenotyped BC and A(BC) crossbred animals. A total of 48,000 genotypes were from purebred (i.e., A, B, and C) animals randomly sampled among all purebred animals from the generations 205 until 208, regardless whether they had a phenotype or not. A total of 15,000 genotypes were from purebred (i.e., A, B, and C) animals randomly sampled among all purebred animals from generation 209. These 15,000 animals did not have phenotypes and are hereafter considered as selection candidates. Average numbers of purebred and crossbred animals per generation with a phenotype and a genotype are provided in Supplementary Table S2.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A