Marker data analysis and germplasm classification

ZG Zifeng Guo
QY Quannv Yang
FH Feifei Huang
HZ Hongjian Zheng
ZS Zhiqin Sang
YX Yanfen Xu
CZ Cong Zhang
KW Kunsheng Wu
JT Jiajun Tao
BP Boddupalli M. Prasanna
MO Michael S. Olsen
YW Yunbo Wang
JZ Jianan Zhang
YX Yunbi Xu
ask Ask a question
Favorite

Haplotypes were constructed when two or more SNPs were scored from a single amplicon. Because the individuals sampled came from inbred lines, the vast majority of SNP genotypes were homozygous, making haplotypes unambiguous. Theoretically, for mSNPs within a single amplicon, the number of haplotypes within the amplicon is 2n, by which the number of theoretical haplotypes was determined.

Marker analysis was performed at three levels. At the level of marker types, four marker types were derived from the 40K mSNP mother panel: 40K high-PIC SNPs (SNPs with the highest PIC value from each mSNP), 40K random SNPs (SNPs with an intermediate PIC value from each mSNP), 251K SNPs (all the SNPs across 40K mSNP loci), and 159K haplotypes (MAF > 5%). At the level of genomic regions, SNP markers were classified into five categories: UTR5, intergenic, CDS, intronic, and UTR3. At the level of marker alleles, data analysis was performed for di-allelic SNPs and indels.

The missing rate, MAF, and heterozygosity were calculated for each SNP locus and haplotype. PIC, described by Botstein et al. (1980), was used to refer to the relative value of each marker with respect to the amount of polymorphism exhibited, which was estimated by

where Pi and Pj are the population frequencies of the ith and the jth alleles. GD is relevant to the sum of squares of allele frequencies and estimated as:

where Pi is the frequency of the ith allele. The genetic distance between genotypes was evaluated using the average nucleotide difference of the genotype in TASSEL 5.0 (Bradbury et al., 2007). Genomic divergence between populations and pairwise nucleotide diversity within a population were calculated using the average value of all genotypes between populations and within populations. The maize germplasm groups were compared based on PIC, GD, and allele frequency difference.

Cluster analysis was performed using UPGMA, and groups were identified from the resulting phylogenetic tree. PCA was performed using TASSEL 5.0.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A