The UK Biobank project (UKB) encompasses ~500,000 British volunteers with informed consent containing genetics, non-imaging variables and brain imaging data acquired using a fixed protocol85. Hereby, brain T1-weighted magnetic resonance imaging (MRI) scans of the UKB, as well as genotyping and covariate information (e.g. sex, age, height, weight, among others), were used as the discovery dataset. We utilized release v1.5 (August 2018) which holds a cohort of 21,780 subjects. This cohort was composed of an adult population (40 to 70 years old, mean of 60 years old), with slightly more females than males (51.6% vs. 48.4% respectively), a predominantly self-reported white British ancestry (97.1%), and an average body mass index (BMI) of 26.6.
For 21,780 subjects, we processed raw MRI data for a surface-based analysis of the cortex using the following four-step procedure. Further details for each step are provided in Supplementary Note, section ‘UK Biobank data processing.’
First, the cortical surfaces were segmented and reconstructed from the MRI volumetric data using recon-all (FreeSurfer86 v.6.0.0; URL section). In this step 20,409 images were processed successfully.
Second, to obtain a minimally preprocessed pipeline similar to the one of the Human Connectome Project (HCP – URL Section), the Connectivity Informatics Technology Initiative file format) (CITIFY, URL section) was used to convert FreeSurfer’s recon-all command output to a HCP-style file format and structure87.
Third, from the CIFTIFY output, we selected the mid-cortical surface of the left and right hemisphere, which is the surface that runs at the mid-distance between the white surface (at the interface between gray and white matter) and the pial surface (the external cortical surface)88. The mid-cortical surface does not over or under-represent gyri or sulci89, but is otherwise an arbitrary choice.
Fourth, as quality control for each hemisphere, we checked the resulting mid-cortical surfaces for mesh artifacts in a semi-automatic manner. All images passed this quality control, yielding 20,407 processed images.
For the list of 20,407 subjects with preprocessed images, we selected genomic data from the UK Biobank, which consisted of the version 3 (March 2018) imputed SNP genotypes, imputed to the Haplotype Reference Consortium and merged UK10K and 1000 Genomes (phase 3) panels. See Supplementary Note, section ‘UK Biobank data processing’ for more details on filtering of SNPs and individuals based on ancestry and relatedness. This resulted in 9,705,931 filtered SNPs for GWAS analysis on 19,670 unrelated subjects of European descent.
For the list of 19,670 subjects with preprocessed brain and genetic data, we collected the following list of covariates to control for during statistical testing: genetic sex, age, age-squared, height, weight, diastolic blood pressure, systolic blood pressure, and the first 20 genetic PCs. Furthermore, the following imaging specific parameters were also included, following Elliot et al.90: volumetric scaling from T1 head image to standard space, XYZ-position of brain mask in scanner co-ordinates, Z-position of table/coil in scanner co-ordinates, date of attending assessment center, and assessment center (coded as a dummy variable for each of the 21 centers). See Supplementary Note, section ‘UK Biobank data processing’ for more details on covariate-based filtering individuals. Next, to symmetrize brain shape, the right hemisphere was reflected to the side of the left hemisphere, by changing the sign of the x-coordinate for all of the 29,759 3D vertices on the surface of the right hemisphere. We performed a generalized Procrustes superimposition (GPA)91, thus eliminating differences in position, orientation, and scale (measured by centroid size) of all left and right hemispheres pooled together. We computed the symmetric brain component as the vertex-wise averaged brain surface of paired and superimposed left and right hemispheres. This resulted in a final discovery dataset of 19,644 subjects containing preprocessed MRI image data on the mid-cortical symmetrized surface, 9,705,931 imputed SNPs, and 54 covariates.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.