To infer the different ancestral components present in admixed populations and the proportions of each such component in an individual’s genome, we performed ADMIXTURE [33] (v1.3.0). Using the Maximum Likelihood Estimation (MLE) and cross-validation approach, ADMIXTURE determines the best fitting model. By increasing the number of K possible ancestries in each run of the analysis on a given dataset, ADMIXTURE computes a cross-validation error (CVE) and estimates the proportion of each of the K ancestry in the genome of each individual of the dataset. The run with the minimum CVE error is considered to be the optimum number of K ancestries that best explains the data. We ran ADMIXTURE with all Mainland Indian and Malaysian populations along with HGDP dataset populations namely EA and CSA. This was done in 3 different ways (a) without LD pruning of SNPs, (b) with LD pruning of SNPs at r2 = 0.1, and (c) with LD pruning at r2 = 0.5. For each SNP set (i.e., (a), (b), and (c)), we estimated the Standard error of the CVE estimate by running ADMIXTURE multiple times. Minimum CVE error in each case was observed at K = 9 but the lowest CVE was when r2 = 0.1 and K = 9. Plots were generated with results of LD pruned dataset at r2 = 0.1. Standard error was estimated for the ancestry proportion estimates at K = 9 using the moving block bootstrap approach implemented in ADMIXTURE. Standard error of each ancestry proportion estimate was generated by running 1000 replicates with K = 9 and r2 = 0.1.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.