METHOD DETAILS

RW Ryan K. Waples
AH Aviaja L. Hauptmann
IS Inge Seiding
EJ Emil Jørsboe
MJ Marit E. Jørgensen
NG Niels Grarup
MA Mette K. Andersen
CL Christina V.L. Larsen
PB Peter Bjerregaard
GH Garrett Hellenthal
TH Torben Hansen
AA Anders Albrechtsen
IM Ida Moltke
ask Ask a question
Favorite

All Greenlandic participants were genotyped on two SNP arrays: the CardioMetaboChip (196,224 SNPs)2,22,32 and the Multi-Ethnic Global Array (~1.5M SNPs).33 Data from these two SNP arrays were merged on the plus strand and 3972 individuals with genotypes from both SNP arrays and a missing rate below 0.02 were retained. From these we removed singletons, sites not on an autosome, as well as sites with a significant (p <1e-10) deviation from Hardy-Weinberg equilibrium in a test that accounts for admixture.29

The European SNP array data are from the Wellcome Trust Case Control Consortium (EGAD00000000120, EGAD00010000124, EGAD00010000288, EGAD00010000632),20,21 and were selected to represent a broad spectrum of potential European admixture sources in Greenland (Figure S1). The European datasets were lifted to hg19 and put on the plus strand, and sites with rates of missing data > 0.05 were removed prior to merging. We also excluded sites within the MHC region and within the HsInv0501 inversion on chromosome 8, as well as sites with more than two alleles. Finally, we limited the number of individuals from each European country to at most 1000 and confirmed that there were no related individuals within each European country.

For the ADMIXTURE analyses and local ancestry analyses with RFmix (see below) we selected the Han Chinese in Beijing (CHB), Yoruba in Ibadan (YRI), and Utah residents with Northern and Western European Ancestry (CEU) population samples from the Thousand Genomes Project (1000G),23 for a total of 310 individuals. We used the phased genotypes from phase 3 aligned to GRCh37.

For the haplotype-based analyses we worked on a dataset where the Greenlandic data and the European reference samples were merged. We kept all sites present in both datasets and excluded 52 sites with more than 2% missing data. The resulting merged dataset had 135,702 loci and 12,247 individuals with a total genotyping rate of 0.9995 and all loci with a minor allele count of at least 5.

The merged Greenlandic-European dataset was split by chromosome and phased without a reference panel using SHAPEIT34 (v2.r904) with default settings, using the HapMap phase II recombination map for hg19.

After merging and phasing, we removed close relatives among all Greenlandic individuals by retaining at most one individual from each pair of individuals with a coefficient of relatedness > 0.2. Then we split the remaining Greenlanders into two sets based the results of a K = 2 ADMIXTURE: 1) the un-admixed Greenlanders with >99% inferred Inuit ancestry, and 2) the admixed Greenlanders with > % inferred European ancestry, for additional details see Data S1. From the second set, we removed seventeen Greenlandic individuals estimated to have >5% African or >7% Asian ancestry in a K = 4 ADMIXTURE analyses including 1000 genomes samples from China (CHB), Nigeria (YRI), the US (CEU). These thresholds were selected to exclude individuals that differed markedly from the majority of other Greenlandic individuals (data not shown) and to be able to avoid having to include any Asian and African reference samples in our fine-scale analyses. We also excluded admixed Greenlandic individuals living in Denmark as these individuals may be more likely to have Danish ancestry than other European ancestries. This left us with a dataset consisting of 1582 not closely related Greenlanders with European admixture (admixed samples), 181 not closely related unadmixed Greenlanders (Inuit reference samples), and 8303 European reference samples.

Based on the results of a pilot ChromoPainter analysis, we subsequently excluded 28 of the European reference samples because they were significant outliers (z-score > 5), based on comparing their total chunk counts to the rest of the individuals from their country (not shown). An atypically high number of chunks can be indicative of low data quality. This resulted in a final set of 8275 European reference samples (Figure S1) and thus 8275+181 = 8456 reference samples in total and 1582 not closely related Greenlanders with European admixture. These data were used to infer ancestry contributions, for details of this analysis see the Quantification and statistical analysis section and Data S2.

To construct a dataset for the ADMIXTURE and local ancestry analyses, we merged the Greenlandic genotype data with data from 310 individuals from three 1000G populations: Han Chinese in Beijing (CHB), Yoruba in Ibadan (YRI), and Utah residents with Northern and Western European Ancestry (CEU). We subsequently removed 46 sites with a greater than 0.25 frequency difference in the CEU individuals compared to the European admixture component in the K = 2 analysis (see below), retaining 521,622 overlapping sites.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A