Genotyping and quality control (QC) for BioMe participants has been previously described19. In brief, BioMe participants (N=32,595) were genotyped on the Illumina Global Screening Array (GSA) platform (635,623 variants). All QC steps were conducted using PLINK (v1.90b3.43)36,37. QC was performed stratified by self-reported race/ethnicity categories. Individuals were filtered according to heterozygosity rate (removed if +/− 6 standard deviations of the population-specific mean), call rate (<95%), discordant self-reported and genetically-determined sex, and duplicates. Sites were filtered for call rate (below 95% were excluded), and violation of Hardy-Weinberg equilibrium (HWE) (threshold of p <1×10−5 for all populations except HL, where it was set to p < 1×10−13). After these QC steps, 31,705 individuals and 604,869 sites remained for downstream analysis. To estimate genetic relatedness, pairwise kinship coefficients were then estimated for all remaining participants using all SNPs using the KING software (v1.4)38 by passing the --kinship flag. For any first or second-degree relationships (as defined by a pairwise kinship coefficient of >= 0.0442 in the KING output), we randomly removed one individual from the analysis (leading to the exclusion of 3,500 participants).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
 Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.