To investigate the population structure of the GWAS panel, we used a Bayesian clustering in fastSTRUCTURE (Raj et al., 2014). A number of clusters (K) ranging from 1 to 10 were tested using the default priors. The chooseK.py script in fastStructure was used to estimate the reasonable range of K for the appropriate model complexity. The admixture proportions of each genotype were visualized by DISTRUCT plots (Rosenberg, 2004).
The GWAS was performed using both single- and multi-locus models, by GAPIT version 3 (Wang and Zhang, 2020), mrMLM 4.0 (Zhang et al., 2020), FarmCPU (Liu et al., 2016) and G model (Bernardo, 2013; Table 1). G model was carried out using GModel2 software1. GModel2 requires non-missing values in the input files, therefore individuals with genotyping rate lower than 90% and makers with missing data were excluded from the analysis. Moreover, linkage disequilibrium (LD) pruning was applied using PLINK 1.9 (Chang et al., 2015) for excluding the SNPs within the same chromosome that had r2 > 0.85. A p-value threshold of 1E-07 (highly significant and far surpassing the Bonferroni correction) was considered as a cutoff to call the significant trait-marker associations.
Models used for GWAS analysis.
All the other GWAS approaches were performed for the same individuals used in G model analysis without excluding the markers with missing data and applying LD pruning, and analyses were performed in R V4.0.1 with the corresponding packages. Six different models, including four single-locus models (MLM, GLM, CMLM and SUPER) and two multi-locus models (BLINK and MLMM), were applied using GAPIT version 3 (Wang and Zhang, 2020). The six multi-locus GWAS methods (mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB, and ISIS EM-BLASSO) from mrMLM 4.0 (Zhang et al., 2020) were also used in this study. The Q matrices obtained from fastStructure were included as covariates in GAPIT, mrMLM and FarmCPU analysis, and for both GAPIT version 3 and mrMLM 4.0, all parameters in GWAS were set at default values, as the significant marker-trait associations were determined by the p-values ≤ 0.05 with Bonferroni correction (GAPIT Version 3) or LOD score ≥ 3 (mrMLM 4.0). Concerning FarmCPU, the default p-value threshold of 0.01, to select the pesudo-QTNs into the model for the first iteration with Bonferroni correction, can be overly strict when the genotypic markers have large LD. Therefore, we set this threshold to 0.05. Markers were defined as being significantly associated with the traits when the p-values were lower than 0.01 with Bonferroni correction.
In this study, the reliable marker-trait associations were considered when markers were repeatedly detected in at least two methods and/or two datasets using GAPIT version 3, mrMLM 4.0 and FarmCPU; or when markers detected in any of the three R packages shared the same haploblock with the markers detected in the GModel2.
Statistical difference among different genotype of significant associated marker were detected by performing ANOVA and Dunnett’s T3 test in SPSS v.27 (IBM®) at significance of p < 0.05.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.