Dataset. We assembled an additional HDD by retaining only samples genotyped on the Illumina Infinium Omni2.5-8 BeadChip from our larger modern dataset. In particular, we included seven populations from the 1000 Genomes Project: the five European populations (Northern European from Utah, CEU; England, GBR; Finland, FIN; Spain, IBS; Italy from Tuscany, TSI), one from Asia (Han Chinese, CHB), and one from Africa (Yoruba from Nigeria, YRI). We also retained 466 Italian samples, whose four grandparents were born in the same Italian region. The Italian samples were broadly clustered according to their geographical origin into Northern, Central, and Southern Italians, and Sardinians, while TSI samples from the 1000 Genome Project formed a separate cluster (data file S1).

From this dataset, we extracted 7164 Neanderthal SNPs tagging Neanderthal introgressed regions (24). To select which allele was inherited from Neanderthals, we chose the one from the Altai Neanderthal (26) genome when it was homozygous and the minor allele in YRI when it was heterozygous.

Number of Neanderthal alleles in present-day human populations. To provide a direct (relative) estimate of the Neanderthal DNA present in different individuals, we initially pruned Neanderthal tagging SNPs in LD and counted the number of Neanderthal alleles considering all the tag-SNP across all samples. Then, we compared the distribution of Neanderthal allele counts across populations with the two-sample Wilcoxon rank sum test. We repeated the same analyses after removing outlier individuals.

Basal Eurasian ancestry and Neanderthal contribution. To infer the proportion of Basal Eurasian present in European populations and investigate its impact in shaping variation in the Neanderthal legacy across populations (4, 5), we used the f4 ratio implemented in the ADMIXTOOLS package (62) in the form f4(Target, Loschbour, Ust_Ishim, Kostenki14)/f4(Mbuti, Loschbour, Ust_Ishim, Kostenki14). We repeated this approach to infer the Neanderthal ancestry in the form f4(Mbuti, Chimp Target, Altai)/f4(Mbuti, Chimp, Dinka, Altai) (fig. S9, E to H). We then performed the same analyses by grouping the modern individuals according to the CP/fS inferred clusters (see the “Analysis of modern samples” section) and retained only clusters with at least 10 samples (Fig. 4C).

African ancestry and Neanderthal legacy. The impact of African contributions in shaping the amount of Neanderthal occurrence was evaluated by exploring how the removal of the clusters showing African gene flow as detected by the GT analysis (Fig. 3), and of individuals belonging to these clusters, affected the correlation between Basal Eurasian/Neanderthal estimates and the degree of population differentiation in the amount of Neanderthal alleles, respectively (Supplementary Materials and fig. S9, A to D).

Comparison of Neanderthal allele frequencies across modern populations. We explored significant differences in the frequencies of Neanderthal alleles across populations by computing the allele frequency differences for every SNP for each of the possible pairs of the 11 populations in our dataset, thus obtaining 55 distributions (see the Supplementary Materials). Then, the NTT SNPs, i.e., the Neanderthal-tag SNPs in the top 1% of each distribution, were selected (data file S6).

The biological implications of Neanderthal introgression. Given the list of genes overlapping the Neanderthal introgressed regions harboring the NTT SNPs and the list of genes directly harboring the NTT SNPs, we performed different enrichment tests with the online tool EnrichR (63). Particularly, we searched for significant enrichments compared to the human genome using the EnrichR collection of database, e.g., dbGaP, Panther 2016, HPO, and KEGG 2016 (data file S6). We then investigated known direct associations between the Neanderthal alleles of the NTT SNPs and phenotypes by looking in the GWAS and PheWAS catalogs (https://phewascatalog.org/phewas) and applying the PheGenI tool (https://www.ncbi.nlm.nih.gov/gap/phegeni) (data file S6). We used the circos representation as in Kanai et al. (64) to highlight different sets of NTT SNPs (Fig. 4F).

Note: The content above has been extracted from a research article, so it may not display correctly.



Q&A
Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.



We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.