Sampling protocols and analysis procedures for stable oxygen isotope analysis of enamel carbonate follow those set out in Roberts et al. [173] (see also [174–176]). Teeth were cleaned to remove adhering material using air-abrasion, and a diamond-tipped drill was used to obtain a powder sample. The full length of the buccal surface was abraded in order to capture a representative bulk sample from the maximum period of formation. To remove organic or secondary carbonate contamination, the enamel powder was pre-treated in a wash of 1.5% sodium hypochlorite for 60 minutes; this was followed by three rinses in purified H2O and centrifuging, before 0.1 M acetic acid was added for 10 minutes. Samples were then rinsed again three times with milliQ H2O and freeze dried for 4 hours. Enamel powder was weighed out into 12 ml borosilicate glass vials and sealed with rubber septa. The vials were flush filled with helium at 100 ml/min for 10 minutes. After reaction with 100% phosphoric acid, the CO2 of the sample was analyzed using a Thermo Gas Bench 2 connected to a Thermo Delta V Advantage Mass Spectrometer at the Stable Isotope Laboratory, Department of Archaeology, MPI-SHH. δ18Oc values were calibrated against International Standards (IAEA NBS 18: δ18O -23.2±0.1‰, IAEA-603 [δ18O -2.37±0.04‰]; IAEA–CO–8 [δ18O -22.7±0.2‰,]). Repeated analysis of MERCK standards suggests that machine measurement error is ca. +/- 0.1‰ for δ18Oc. Overall measurement precision was determined through the measurement of repeat extracts from a bovid tooth enamel standard (n = 20, ±0.3‰ for δ18Oc).
Sampling protocols and analytical procedures for strontium follow those set out in Copeland et al. [177]. Enamel powder was obtained with a diamond-tipped drill along the full length of the buccal surface after cleaning with air-abrasion. Up to 4 mg of enamel powder was digested in 2 ml of 65% HNO3 in a closed Teflon beaker placed on a hotplate for an hour at 140°C, followed by dry down and re-dissolving in 1.5 ml of 2 M HNO3 for strontium separation chemistry, which followed Pin et al. [178]. The separated strontium fraction was dried down and dissolved in 2 ml 0.2% HNO3 before dilution to 200 ppb Sr concentrations for analysis using a Nu Instruments NuPlasma High Resolution Multi Collector Inductively Coupled Plasma-Mass Spectrometry (HR-MC-ICP-MS) at the Department of Geological Sciences, University of Cape Town. Analyses were controlled by reference to bracketing analyses of NIST SRM987, using a 87Sr/86Sr reference ratio of 0.710255. Data were corrected for rubidium interference at 87 amu using the measured 85Rb signal and the natural 85Rb/87Rb ratio. Instrumental mass fractionation was corrected using the measured 86Sr/88Sr ratio and the exponential law, along with a true 86Sr/88Sr ratio of 0.1194. Results for repeat analyses of an in-house carbonate reference material processed and measured as unknown with the batches (87Sr/86Sr = 0.708909; 2 sigma 0.000040; n = 7) are in agreement with long-term results for this in-house reference material (87Sr/86Sr = 0.708911; 2 sigma = 0.000040; n = 414).
DNA data production of all eleven samples analyzed in this study took place in the dedicated aDNA facility of the MPI-SHH in Jena, Germany. The eleven samples belonged to ten individuals that were analyzed here for the first time and one individual that was already published as ALA015 in Skourtanioti et al. [49] but re-assigned here to individual ALA110. The initial reason for analyzing the petrous bone of ALA110 was the outlier position this individual takes up in the strontium isotope analysis (see below). During analysis of the aDNA data, we realized that the petrous bones of ALA015 and ALA110 belonged to one and the same context and individual and subsequently merged their data accordingly into ALA110, thus improving its aDNA data. In all samples analyzed, sampling targeted the inner-ear part of the petrous bone [164]. DNA extraction and double-stranded genomic libraries were prepared for four samples (ALA118, ALA120, ALA123, and ALA124) according to the MPI-SHH Archaeogenetics protocols for Ancient DNA Extraction from Skeletal Material, and Non-UDG treated double-stranded (ds) ancient DNA library preparation for Illumina sequencing, both archived and accessible at doi.org/10.17504/protocols.io.baksicwe and doi.org/10.17504/protocols.io.bakricv6, respectively. The library preparation protocol was modified with the introduction of partial Uracil DNA Glycosylase (UDG) treatment prior to the blunt-end repair, according to Rohland et al. [179]. Dual-indexed adaptors were prepared according to the archived MPI-SHH Archaeogenetics protocol accessible at doi.org/10.17504/protocols.io.bem5jc86.
For the remaining seven samples (ALA110, ALA128, ALA130, ALA131, ALA135, ALA136, and ALA138), DNA extraction was performed according to Rohland et al. [180], and single-stranded (ss) libraries (no UDG treatment) were prepared according to Gansauge et al. [181], both protocols using an automated liquid-handling system. All libraries were first shotgun sequenced (~5M reads) in a sequencing Illumina HiSeq4000 platform. Raw FastQC sequence data were processed through EAGER [182] for removal of adapters (AdapterRemoval [v2.2.0]) [183], read length filtering (>30b), mapping against hs37d5 sequence reference (BWA [v0.7.12]) [184], q30 quality filter, removal of PCR duplicates (dedup [v0.12.2]) [182], and DNA damage estimation (mapdamage [v2.0.6]) [185]. Two main characteristics of the sequenced reads were considered in order to select positive libraries for submission to an in-solution hybridization enrichment that targets 1,233,013 genome-wide and ancestry-informative single nucleotide polymorphisms (SNPs; “1240K SNP capture”) [50]. The first is the proportion of DNA damage at the end of the reads (> ~5% C-T/ G-A substitution at terminal 5’ and 3’ base, depending on the UDG treatment of the library), and the second is the content of endogenous DNA (>0.1%) calculated as the portion of reads mapped against the hs37d5 reference over the total amount of sequenced reads after the length filtering. The enriched libraries were sequenced at the order of ≥20M reads, and the raw FastQC sequence data were processed through EAGER as described above. BAM (binary alignment map) files across libraries from the same sample were merged with samtools and dedup was rerun. Masked versions of the BAM files were created with trimBam (https://genome.sph.umich.edu/wiki/BamUtil:_trimBam), which masked the read positions with high damage frequency, that is, the terminal 2 and 10 bases for the partially UDG-treated ds libraries and ss libraries (no UDG), respectively. With “samtools depth” from the samtools, the coverage on X, Y, and autosomal chromosomes (v1.3) [186] was calculated on the masked BAM files, providing the bed file with the 1240K SNPs. X and Y coverage were normalized by the autosomal coverage (X-rate and Y-rate respectively), and females without contamination were determined by X-rate ≈ 1 and Y-rate ≈ 0, whereas males without contamination were determined by both rates ≈ 0.5. Contamination on the mitochondrial DNA was estimated with Schmutzi [187], and on the nuclear DNA of males with ANGSD (MoM estimate is provided) [188]. Additionally, AuthentiCT was applied on the BAMs generated from ss libraries. This new method models the pattern of accumulation of damage-induced deamination across the length of ancient molecules and, therefore, can estimate contamination from modern sources [189]. Currently, the method can be only applied to ss-libraries without UDG treatment, whose molecules preserve the original damage.
Genotypes were called with the tool pileupCaller (https://github.com/stschiff/sequenceTools/tree/master/src/SequenceTools) according to the Affymetrix Human Origins panel (~600K SNPs) [42, 190], the 1240K panel [50], and the option randomHaploid, which randomly draws one read at every SNP position. The random calling was performed both on the original and the masked bam files of each ds library, and, for the final genotypes, we kept transitions from the masked and from the original bam files. For the ss libraries, genotypes were extracted from the original BAM files and the option ‘singleStrandMode’ that removes reads with post-mortem damage based on their alignment on the forward or the reverse strand of the human reference genome. Y-haplogroups were assigned to the male individuals after filtering the pileup from the masked BAMs for an updated list of ISOGG SNPs. For each of these SNPs, a record of their state (ancestral or derived) was collected. Then, it was manually checked whether the presence of diagnostic SNPs for a given haplogroup also included diagnostic SPNs from the root to the tip of the tree, or whether there were spurious jumps in the phylogeny because of damage.
When, after the 1240K enrichment, a ≥5x coverage on mitochondrial DNA was reached, consensus mitochondrial sequences were inferred using Schmutzi [187] and mapping with the CircularMapper [191] against the rCRS. A mapping quality filter of q30 and consensus quality score Q30 were applied. The mitochondrial haplogroups from the consensus sequences were assigned with Haplogrep [192].
The genome-wide data from the new individuals were combined with previously published ancient and modern data [39, 42, 45, 46, 49, 162, 193–220]. For readability, we kept most of the group labels used in Skourtanioti et al. [49], most importantly “Alalakh_MLBA”, “ALA019” (genetic outlier) (n = 1), “Ebla_EMBA” (n = 9), “K.Kalehöyük_MLBA” (Kaman-Kalehöyük, n = 5), and “TellKurdu_EC” (n = 5), but dubbed individuals from Sidon with the label “Sidon_MBA” (instead of “Levant_MBA”; n = 5). A principal component analysis was performed on a subset of western Eurasian populations of the Human Origins Dataset using smartpca program of EIGENSOFT (v6.01) [221, 222] (default parameters and options lsqproject: YES, numoutlieriter:0).
The degree of genetic relationship among Alalakh_MLBA individuals (n = 34 after quality filtering) was assessed by applying and comparing two different methods: READ [223] and lcMLkin [224]. Read is a software that can estimate up to second degree relationships from low-coverage genomes by calculating the proportion of non-matching alleles for a pair of individuals (P0) in non-overlapping windows of 1 Mbps. P0 was normalized with the median of P0 from all pairs–assuming that most pairs are unrelated–in order to reduce the effects of SNP ascertainment, within-population diversity, and marker density.
LcMLkin uses a Maximum Likelihood framework on genotype likelihoods from low-coverage DNA sequencing data and infers k0, k1, and k2, the probabilities that a pair of individuals share, respectively, zero, one, or two alleles identical-by-descent (IBD), as well as the overall coefficient of relatedness (r). Two useful aspects of this method are that it serves for distinguishing between parent-offspring (k0 = 0) and siblings (k0≥0, depending on recombination rate) and can infer relatedness down to the 5th degree. However, a discrepancy from the expected k0, k1, k2, and r values can occur under scenarios of recent admixture, inbreeding, contamination, and low-quality data. We ran lcMLkin on the masked bam files with the options -l phred and -g best.
Modelling of ancestry proportions was performed with qpWave and qpAdm programs from ADMIXTOOLS [v7.1] [190] using the following set of Right populations (also named outgroups or references): Mbuti.DG, Ami.DG, Onge.DG, Mixe.DG, Kostenki14, Eastern European hunter-gatherers (EEHG), Western European hunter-gatherers (WEHG), Levant_EP, and Anatolia_N (Barcın, Mentese and Boncuklu). These programs compute a matrix of f4-statistics for the Right and Left populations (targets for qpWave and target and sources for qpAdm) in the form of Fij = F4(L1,Lj;R1,Rj). Then, with a likelihood ratio test, the null model is compared against the full-rank model in which all columns of the matrix are independent. In the latter model, the n Left populations relate with the references through n waves of ancestry, which for qpAdm, implies that the target cannot be explained as a combination of the selected source populations (null model). Depending on the chosen cutoff, a tested null model with p-value ≤0.01 or ≤0.05 and/or infeasible admixture coefficients (outside 0–1 range) is considered a poor fit of the data and thus, it is rejected. For this group-based analysis, we kept only individuals who are not genetically related.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.