We obtained genotype information on 94,474 BioVU individuals of different ancestral and racial backgrounds genotyped on the Illumina MEGAEX array. Using PLINK v1.9 [15], genotypes were filtered for SNP and individual call rates, sex discrepancies, and excessive heterozygosity (Additional file 1). We selected individuals of European or African ancestry using principal component analysis implemented in Eigenstrat [16, 17] and confirmed the absence of genotyping batch effects through logistic regression with “batch” as the phenotype. Imputation was completed using the Michigan Imputation Server [18] using the Haplotype Reference Consortium (HRC) reference panel. SNPs were then filtered for SNP imputation quality (R2 > 0.3) and converted to hard calls. We restricted to autosomal SNPs, filtered SNPs with minor allele frequency > 0.01, or with allele frequencies that differed by more than 10% from the 1000 Genomes Project phase 3 CEU or ASW set respectively [19], and Hardy-Weinberg Equilibrium (p > 1 × 10−10). The resulting dataset contained 6,303,629 SNPs on 72,824 individuals of European genetic ancestry and 12,798,111 SNPs on 15,283 individuals of African genetic ancestry.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.