DNA extraction and genotyping were performed on saliva samples by the National Genetic Institute, USA. Participants were genotyped on one of four different platforms: V1, V2, V3, and V4. The V1 and V2 platforms have a total of 560,000 SNPs largely based on the Illumina HumanHap550+BeadChip. The V3 platform has 950,000 SNPs based on the Illumina OmniExpress+Beadchip and has custom content to improve the overlap with the V2 platform. The V4 platform is a fully customized array and has about 570,000 SNPs. All samples had a call rate greater than 98.5%. A total of 1,030,430 SNPs (including Insertion/Deletion or InDels) were genotyped across all platforms. Imputation was performed using the March 2012 (v3) release of the 1000 Genomes Phase 1 reference haplotypes. First, we used Beagle (version 3.3.1)40 to phase batches of 8000–9000 individuals across chromosomal segments of no more than 10,000 genotyped SNPs, with overlaps of 200 SNPs. SNPs were excluded if they were not in Hardy–Weinberg equilibrium (P < 10−20), had a genotype call rate less than 95%, or had discrepancies in allele frequency compared to the reference European 1000 Genomes data (χ2 P < 10−15). We then imputed each phased segment against all-ethnicity 1000 Genomes haplotypes (excluding monomorphic and singleton sites) using Minimac241, using 5 rounds and 200 states for parameter estimation. We restricted the analyses to only SNPs that had a minor allele frequency of at least 1%. For genotyped SNPs, those present only on platform V1 or in chromosome Y and mitochondrial chromosomes were excluded due to small sample sizes and unreliable genotype calling respectively. Next, using trio data from all research participants in the 23andMe dataset, where available, SNPs that failed a parent offspring transmission test were excluded. For imputed SNPs, we excluded SNPs with average r2 < 0.5 or minimum r2 < 0.3 in any imputation batch, as well as SNPs that had strong evidence of an imputation batch effect. The batch effect test is an F-test from an analysis of variance of the SNP dosages against a factor representing imputation batch; we excluded results with P < 10−50. After quality control, 9,955,952 SNPs were analysed. Genotyping, imputation, and preliminary quality control were performed by 23andMe.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.