We downloaded phased haplotypes for 278 individuals from https://sharehost.hms.harvard.edu/genetics/reich_lab/sgdp/phased_data/PS2_multisample_public/, and rephased these jointly with high coverage ancients (Ancient Genomes Data) using SHAPEIT4 (Delaneau et al. 2019). We first used the 1000 Genomes Project (1000GP) reference panel (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/) to phase all sites overlapping with 1000GP and then internally phased all remaining sites, whereas keeping the already phased sites fixed.
We downloaded 430 ancient genomes for use in this study (supplementary table 1, Supplementary Material online). All samples had a genome-wide mean coverage of 0.5x or more. We selected 14 high coverage ancient genomes (mean genomic coverage >7.8x) for the Relate analysis.
For these 14 high coverage genomes (supplementary table 1, Supplementary Material online) genotypes were called using samtools mpileup (input options: -C 50, -Q 20 and -q 20) and bcftools call –consensus-caller with indels ignored (Li 2011). A modified version of the bamCaller.py script from https://github.com/stschiff/msmc-tools was used to output variant sites. We generated a quality mask for each ancient genome, declaring only sites with at least 5x coverage and below twice the mean genomic coverage as passing.
We merged these 14 ancient genomes with the 278 SGDP samples to infer joint genealogies using Relate. We constructed a conservative joint mask, declaring only sites passing in all of the 14 ancients, as well as a universal mask file provided with the SGDP data set, as passing. The SGDP universal mask was obtained from https://reichdata.hms.harvard.edu/pub/datasets/sgdp/filters/all_samples/.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.