Testing GRAF-pop in comparison with existing ancestry-prediction software packages

YJ Yumi Jin
AS Alejandro A. Schaffer
MF Michael Feolo
JH J. Bradley Holmes
BK Brandi L. Kattman
request Request a Protocol
ask Ask a question
Favorite

We compared the performances and prediction accuracies between GRAF-pop and existing software EIGENSTRAT (Price et al. 2006), FastPCA (Galinsky et al. 2016), SNPweights (Chen et al. 2013), and FlashPCA2 (Abraham et al. 2017) using the dbGaP studies listed in the first subsection. The first three programs are included in the software package EIGENSOFT 7.1.2 (https://www.hsph.harvard.edu/alkes-price/software/).

We extracted genotypes of the 10,000 fingerprint SNPs and saved them into PLINK sets. We compared the performances of different software packages using the dataset of phs000420.v6.p3, as well as the datasets combined from two or three studies. Missing genotypes were retained in some datasets to evaluate the software packages in the presence of missing genotypes. Since PCA results are displayed in different scales and directions by different PCA programs, PC1 and PC2 values generated by the PCA software packages were normalized using the following method: 1) Genotypes of the HapMap subjects were combined with the datasets to be tested, 2) PC1 and PC2 values were treated as the x, y coordinates, 3) The centroids of the three HapMap populations CEU, YOR and ASN were calculated and used as the vertices of the reference triangle ΔEFA as mentioned above, 4) The barycentric coordinates with respect to ΔEFA of all subjects were calculated, and converted back to Cartesian coordinates using reference triangle ΔEFA0 as mentioned above, and 5) The converted Cartesian coordinates were plotted on scatter plots.

Since GRAF-pop can also estimate the ancestry proportion for each subject, like model-based approaches, we compared GRAF-pop with ADMIXTURE (Alexander et al. 2009). Because ADMIXTURE requires that no subjects in the dataset be closely related, we used GRAF (Jin et al. 2017) software to find the related subjects and created a dataset including only unrelated subjects to test the software tools.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A