2.5. Accuracy of Genomic Prediction under a 10-Fold Cross-Validation

JL Jungjae Lee
YK Yongmin Kim
EC Eunseok Cho
KC Kyuho Cho
SS Soojin Sa
YK Youngsin Kim
JC Jungwoo Choi
JK Jinsoo Kim
JH Junki Hong
TC Taejeong Choi
ask Ask a question
Favorite

To account for the relatively small sample size of the prediction model, a 10-fold cross-validation strategy was used to estimate the accuracies of the genomic prediction models. Previous study related to the number of folds on the process of the cross validation have reported that trade-off effects were detected between the number of folds and the relationships between training and testing sets [8]. Nevertheless, we used a 10-fold cross-validation to maximize the size of the training data because of the limited reference data set in Korean Duroc pigs. For each trait of interest in this study (BFAT, DAYS, LMA, and PCL) and following the procedures outlined by Saatchi et al. [9], genotyped animals were split into ten groups using K-means clustering to reduce the relationships between training and testing populations. A total of 3821 elements of pedigree information related to the 964 genotyped Duroc pigs was used for K-means clustering, giving the number of individuals within each fold, and within and between fold averages of amax and aij, and their standard deviations (Table 2).

Comparison of relationships among animals within and across clusters in K-means 10-fold cross validations.

1 inBreC = inbreeding coefficients within clusters; 2 amax_within = the average of amax value (the maximum value of relationships for each individual) within clusters; 3 amax_between = the average of amax values between clusters (training and testing); 4 aij_within = the average of aij values (relationships) within clusters; 5 aij_between = the average of aij values between clusters (training and testing).

Accuracies of genomic prediction were assessed by the correlation between the MBVs of genotyped animals from each validation set and their response variables, r(y^, y), where y is a vector of pseudo-phenotypes (DEBVexcPA or DEBVincPA) for the validation set and y^ is a vector of MBV for the corresponding animals in y.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A