Principal component analysis, k-means, and ADMIXTURE (dx.doi.org/10.17504/protocols.io.bkwbkxan in protocols.io)

Israel Aguilar-Ordoñez; Fernando Pérez-Villatoro; Humberto García-Ortiz; Francisco Barajas-Olmos; Judith Ballesteros-Villascán; Ram González-Buenfil; Cristobal Fresno; Alejandro Garcíarrubio; Juan Carlos Fernández-López; Hugo Tovar; Enrique Hernández-Lemus; Lorena Orozco; Xavier Soberón; Enrique Morett

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Principal component analysis, k-means, and ADMIXTURE (dx.doi.org/10.17504/protocols.io.bkwbkxan in protocols.io)

IA Israel Aguilar-Ordoñez

FP Fernando Pérez-Villatoro

HG Humberto García-Ortiz

FB Francisco Barajas-Olmos

JB Judith Ballesteros-Villascán

RG Ram González-Buenfil

CF Cristobal Fresno

AG Alejandro Garcíarrubio

JF Juan Carlos Fernández-López

HT Hugo Tovar

EH Enrique Hernández-Lemus

LO Lorena Orozco

XS Xavier Soberón

EM Enrique Morett

This method is extracted from research article: PLoS One, Apr 2021

Whole genome variation in 27 Mexican indigenous populations, demographic and biomedical insights

DOI: 10.1371/journal.pone.0249773

Request a Protocol

Ask a question

Favorite

The pipeline for running PCA, k-means, and ADMIXTURE from a single dataset can be downloaded: https://github.com/jbv2/VCF2PCP. In brief, from the IPVS, we kept only our NM samples and the 4 NP individuals from the 1000 genomes project (samples ids: HG01926, HG01938, HG01961, HG02272). We kept biallelic SNVs with a MAF > 0.05 with bcftools v1.9-220-gc65ba41, and removed variants in linkage disequilibrium (r2 > 0.85) with bcftools +prune plugin using parameters—window 2000bp—nsites-per-win 1. We transformed VCF files into Eigenstrat format. PCA was performed using Smartpca from Eigensoft v6.1.4 [72] requesting numoutevec: 20. We kept eigenvectors with P-value < 0.01, then recalculated the percentage of variability per eigenvector, being 100% the sum of the selected eigenvalues. In k-means analysis we calculated the Average Silhouette method to define optimal clustering. For ADMIXTURE v1.3 [20] analysis we used the—seed 43 parameter.

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol