Genotypic data and GWAS

HZ Hong Zhang
JZ Jiayue Zhang
QX Qingyu Xu
DW Dandan Wang
HD Hong Di
JH Jun Huang
XY Xiuwei Yang
ZW Zhoufei Wang
LZ Lin Zhang
LD Ling Dong
ZW Zhenhua Wang
YZ Yu Zhou
request Request a Protocol
ask Ask a question
Favorite

Genotyping was carried out on the association panel using an Illumina Maize SNP50 BeadChip, which revealed 56,110 SNPs in the population and was filtered such that SNPs with a missing percentage > 20%, SNPs with a minor allele frequency (MAF) < 0.05, and SNPs with a heterozygosity > 20% were removed [25, 42, 43]. In total, 40,697 SNPs were used for the association analysis, with a MAF of > 0.05 in the population. Using the STRUCTURE 2.3 software, 7742 distributed SNP datasets were assessed for structural parameters [45]. ∆K was calculated using StructureHarvester [46]. The kinship information for 222 inbred lines was estimated using the software TASSEL 5.0.

The GWAS was performed in accordance with the MLM in TASSEL 5.0 [47], and the following 14 traits were used for association analysis: the RGR, RGL, RRL, RRSA, RRV, RGI, RVI, RSVI, XYRGR, XYRGL, XYRSVI, KSRGR, KSRGL and KSRSVI. GEC software (http://grass.cgs.hku.hk/gec/estimateB.php?function=Bonferroni) was used to calculate the effective number of markers (Ne) and to calculate the recommended threshold (0.05/Ne) as the basis for whether the 14 trait values were significantly correlated with a given SNP. Because a Bonferroni correction (0.05 / 23,398 = 2.140e-6) was too conservative (there were very few SNPs significantly associated with the 14 traits), a less stringent threshold of -log10(P) > 4 was used to detect significant association signals [25, 43, 48]. Manhattan plots were subsequently generated by the CMplot package in R software.

The linkage disequilibrium measurement parameter r2 was used to estimate the LD between all SNPs with less than 25% missing data for each chromosome via the software PopLDdecay 3.30 (https://github.com/BGI-shenzhen/PopLDdecay). All significantly associated SNPs on the same chromosome whose physical distance was less than the LD decay distance were defined as one site, and the range of each LD decay upstream and downstream of the SNP of the -log10(P) peak for each site was used to mine candidate genes. The B73 RefGen_V4 gene model from the maizeGDB website (http://www.maizegdb.org/) was used to map the loci and to retrieve genetic information.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A