Genetic Diversity and Population Structure

SP Seema Parveen
NS Nutan Singh
AA Arjun Adit
SK Suman Kumaria
RT Rajesh Tandon
MA Manu Agarwal
AJ Arun Jagannath
SG Shailendra Goel
request Request a Protocol
ask Ask a question
Favorite

To estimate within population diversity, the genetic diversity parameters were calculated using GenAlEx 6.5 and poppr R package. Observed mean number of alleles (Ao), the mean number of effective alleles (Ne), percentage of polymorphic loci (PPL), number of private alleles (Ap), and mean observed heterozygosity (Ho) were calculated using Genalex and the Nei’s unbiased gene diversity/expected heterozygosity (He; Nei, 1978), Shannon–Wiener index of diversity (H; Shannon, 2001), Simpson’s index (lambda; Simpson, 1949), and evenness (E5) were calculated using poppr R package. Private alleles were defined as those discovered only in the population considered, discarding those that were found only once as they could reflect genotyping errors.

To evaluate genetic structure in our dataset, we used three different methods: the model-based Bayesian method implemented in the STRUCTURE version 2.3.4 (Pritchard et al., 2010) and maximum likelihood (ML) estimation method implemented in the SNAPCLUST (Beugin et al., 2018); and the model-free DAPC (Discriminant Analysis of Principal Components) (Jombart et al., 2010).

The STRUCTURE uses Markov Chain Monte Carlo (MCMC) approach to estimate every individual’s admixture proportions for a predefined K value (Pritchard et al., 2010). We ran an analysis in the STRUCTURE version 2.3.4 for 150,000 MCMC replications after 50,000 burn-in steps. About 10 replicates each were performed for K values ranging from 1 to 10. The optimum number of populations (K) was estimated using a web-based program, the STRUCTURE HARVESTER (Earl, 2012). The STRUCTURE HARVESTER uses the method outlined by Evanno et al. (2005) and searches for a mode in the ΔK distribution.

The DAPC combines the advantages of principal component analysis (PCA) and discriminant analysis (DA) (Jombart et al., 2010). It first transforms the data using PCA and then performs DA on retained principal components. The number of groups or genetic clusters was defined based on K-means clustering of principal components using find.clusters function in the ADEGENET 2.1.1 package (Jombart, 2008). The optimal K-value was identified by running k-means sequentially for K-value ranging from 1 to 10 and comparing different clustering solutions using BIC (Bayesian Information Criterion). The best-fit K was chosen based on the point at which the elbow in the curve of BIC values as a function of K was observed. The cross-validation function, xvaldapc, was used to determine the minimum number of PCs (Principal Components) to be retained for accurate assignment of every individual to different groups. Finally, the individuals were assigned to clusters based on the posterior membership probabilities of every individual as a measure of the admixture proportion originating in each cluster.

SNAPCLUST allies the advantages of both model-based approaches as used in STRUCTURE and geometric approaches like those used by DAPC and provides a fast maximum-likelihood solution to the specific genetic clustering problem (Beugin et al., 2018). We used Snapclust.choose.k to choose the more adequate number of clusters (K). The most suitable K-value was selected through Bayesian Information Criterion (BIC) by choosing the point at which the BIC value was lowest. After choosing K, we proceeded with the clustering analysis using the function snapclust, implemented in R package ADEGENET.

To further assess the population genetic structure and evaluate the genetic interrelationships among the individuals collected from different locations, unrooted neighbor joining (NJ), and principal coordinate analysis (PCoA) were performed based on pairwise Nei’s genetic distance matrix. The PCoA was generated using dudi. pco function implemented in ade4 R-package (Dray and Dufour, 2007), and the NJ tree was constructed using NJ function from ape R package (Paradis and Schliep, 2019).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A