Dimensionality reduction analyses

NA N. Ezgi Altınışık
DK Duygu Deniz Kazancı
AA Ayça Aydoğan
HG Hasan Can Gemici
ÖE Ömür Dilek Erdal
SS Savaş Sarıaltun
KV Kıvılcım Başak Vural
DK Dilek Koptekin
KG Kanat Gürün
ES Ekin Sağlıcan
DF Daniel Fernandes
Gökhan Çakan
MK Meliha Melis Koruyucu
VL Vendela Kempe Lagerholm
CK Cansu Karamurat
Mustafa Özkan
GK Gülşah Merve Kılınç
AS Arda Sevkar
ES Elif Sürer
AG Anders Götherström
ÇA Çiğdem Atakuman
YE Yılmaz Selim Erdal
Füsun Özer
Aslı Erim Özdoğan
MS Mehmet Somel
ask Ask a question
Favorite

We summarized the outgroup f3-statistics calculated across all pairs of individuals using MDS and visualized the first two dimensions. First, we created a dissimilarity matrix of pairwise genetic difference (1 − f3) values. From this, we filtered out the pairs that had <2000 overlapping SNPs. We then applied the “cmdscale” function in R.

To perform PCA, we used the “smartpca” (version 16000) software of EIGENSOFT (v7.2.1) (90) with the “shrinkmode: YES” and “lsqproject: YES” option to project ancient individuals onto principal components calculated on genome-wide polymorphism data of 55 Western Eurasian present-day populations (760 individuals) from the Human Origins SNP Panel (11, 30). We additionally computed elliptical confidence for Çayönü individuals with the “ellconf: 0.95” function implemented in smartpca.

We performed unsupervised model-based cluster analysis using ADMIXTURE version 1.3.0 (91). We estimated ancestry components of present-day populations in the Human Origins SNP Array Dataset after pruning for linkage disequilibrium and filtering sites with MAF less than 5% in PLINK (www.cog-genomics.org/plink/1.9/) with parameters “--indep-pairwise 200 25 0.4” and “—maf 0.05,” which retained 179,175 SNPs. After filtering, we selected Western Eurasian modern-day populations (n = 629) and merged them with ancient individuals (n = 307), similar to (92). We performed clustering from K = 2 to K = 6 with default fivefold cross-validation (“--cv = 5”) and 10 replicate runs with different random seeds. The cross-validation procedure of ADMIXTURE was used to choose the optimal value for K. The LargeKGreedy algorithm of CLUMPP (93) was used to determine the common signals between each independent run.

We modeled admixture proportions using the “qpAdm” software from the ADMIXTOOLS (v.7.0.2) package. We selected a reference differentially related to left populations covering modern and ancient diversity (94). We found that the following base reference set was able to distinguish our relevant populations: Mbuti, Ust_Ishim, Kostenki14.SG, MA1, Han, Papuan, Dai, Chukchi, Mixe, CHG, Natufian, WHG, AfontovaGora3, and Iberomaurusian.

We then performed all possible two- and three-way models, adding published genomes representing late Pleistocene and early Holocene populations of Central Anatolia, Zagros, and Levant as surrogates (“left populations”) and Çayönü genomes as targets. We ran all qpAdm analyses with “allsnps: YES” option, which is robust to low-coverage data (94). Any model without a Central Anatolia–related source did not work, yielding P values <0.05. To test potential sex-biased admixture in the Çayönü population, we repeated the same analyses with the X-chromosome dataset. However, all runs failed most likely because of the low coverage of our Çayönü data, combined with the relatively small number of X-chromosome SNPs (220,384 SNPs).

In addition, we modeled Anatolia PN populations, Barcın and Çatalhöyük, as a mixture of Anatolia PPN, Çayönü, and S Levant N population since there is a signal of admixture from Çayönü into these populations (table S6). To increase the resolution, we added an Anatolian Epipaleolithic individual into the reference set. Here, we only used shotgun sequenced published genomes from Barcın to avoid technical confounding.

We compared summed probability distributions of C14-dated individuals using the “stackspd” function from the “rcarbon” package in R (95), with a window size of 100 years (fig. S9A). To test temporal overlap among individuals buried in cell building structures, we sampled ages from the calibrated probability distributions 10,000 times for each individual using the “sampleAges” function from the Bchron package in R (96) and computed differences of dates for each pair. Then, we calculated the mean difference and 95% quantiles to test whether individuals may have lived in the same time period or not (fig. S9B).

All plots were generated in R (97) using ggplot (98) and ggpubr (99) packages. Other packages used to analyze, clean, and visualize the data are the following: tidyverse (100), patchwork (101), reshape2 (102), ggplotify (103), ggrepel (104), emojifont (105), ggforce (106), rgdal (107), raster (108), plyr (109), MetBrewer (110), and pedsuite (111).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A