Genotype data and quality control

JW Josefin Werme
SS Sophie van der Sluis
DP Danielle Posthuma
CL Christiaan A. de Leeuw
request Request a Protocol
ask Ask a question
Favorite

All genotype and phenotype data were obtained from the UK Biobank56 (release 3, March 2018), and this study was conducted under the UK Biobank application 16406. Data collection, primary quality control, and imputation of the genotype data were performed by the UK Biobank itself, the full details of which have been described elsewhere70. We applied further quality control in order to ensure the inclusion only of high-quality variants. This entailed filtering SNPs with a minimum info score of .9 (HRC panel imputed), maximum missingness of 5%, and a minor allele frequency of at least 1%, resulting in a total of 8,614,007 SNPs for the analysis.

We used only European, unrelated samples with concordant sex (see Suppl. Info (A): UK Biobank Sample Information and Quality Control). Thirty principal components (computed with FlashPCA71) were included as covariates in all analyses to control for population stratification. To ensure that the selection of SNPs remained constant across environments, quality control and filtering were performed on the full subset of individuals with complete neuroticism data (see below), and it is, therefore, possible that exact minor allele frequencies and call rates may vary slightly between the sample subsets for each environment.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A