Estonian Biobank

JB Jonas Bovijn
KK Kristi Krebs
CC Chia-Yen Chen
RB Ruth Boxall
JC Jenny C. Censin
TF Teresa Ferreira
SP Sara L. Pulit
CG Craig A. Glastonbury
SL Samantha Laber
IM Iona Y. Millwood
KL Kuang Lin
LL Liming Li
ZC Zhengming Chen
LM Lili Milani
GS George Davey Smith
RW Robin G. Walters
RM Reedik Mägi
BN Benjamin M. Neale
CL Cecilia M. Lindgren
MH Michael V. Holmes
request Request a Protocol
ask Ask a question
Favorite

The EGCUT is the population-based biobank containing longitudinal data and biological samples, including DNA, for 5% of the adult population of Estonia. The broad informed consent form signed by the participants of the biobank allows the Estonian Genome Center to continuously update their records through periodical linking to central EHR databases and registries. We studied the genotypic and phenotypic data of 51,881 individuals, and after removing relatives (PiHat, >0.2), 36,073 individuals (65% women and 35% men) with an average age of 45 years were included for further analysis. For the CHD analyses in EGCUT, we only included participants not previously included in the CARDIoGRAMplusC4D consortium.

Of all the studied biobank participants, 33,155 have been genotyped using the Global Screening Array, 8137 HumanOmniExpress BeadChip, 2640 HumanCNV370-Duo BeadChips, and 6861 Infinium CoreExome-24 BeadChips from Illumina. Furthermore, of 2056 individuals’ whole genomes have been sequenced at the Genomics Platform of the Broad Institute.

Sequenced reads were aligned against the GRCh37/hg19 version of the human genome reference using BWA-MEM1 v0.7.7; polymerase chain reaction duplicates were marked using Picard (http://broadinstitute.github.io/picard) v1.136, and the Genome Analysis Toolkit (GATK) v3.4-46 was applied for further processing of BAM files and genotype calling. All insertion-deletions (indels) in the Variant Call Format were normalized and multiallelic sites split using bcftools (https://samtools.github.io/bcftools/bcftools.html). The following genotypes were set to missing: a genotype quality of <20, a read depth of >200, and an allele balance of <0.2 or >0.8 for heterozygous calls. The GATK’s variant quality score recalibration metric was used to filter variants with a truth sensitivity of 99.8% for single nucleotide variants (SNVs) and of 99.9% for indels. Furthermore, variants with an inbreeding coefficient of <-0.3, a quality by depth of <2 for SNVs and <3 for indels, a call rate of <95%, or HWE P < 1 × 10−6 were excluded.

The genotype calling for the Illumina microarrays was performed using Illumina’s GenomeStudio V2010.3 software. The genotype calls for rare variants on the Global Screening Array (GSA) array were corrected using the zCall software (version 8 May 2012). After variant calling, the data were filtered using PLINK (v.1.90) by sample (call rate of >95%, no sex mismatches between phenotype and genotype data, heterozygosity < mean ± 3 SE) and marker-wise (HWE P >1 × 10−6 and call rate of >95% and for the GSA array additionally by Illumina GenomeStudio GenTrain score of >0.6 and cluster separation score of >0.4). Before the imputation, variants with a minor allele frequency (MAF) of <1% and C/G or T/A polymorphisms as well as indels were removed, as these genotype calls do not allow precise phasing and imputation. The genotype data obtained on all of the arrays were separately phased using Eagle2 (v. 2.3) and imputed using the BEAGLE (v. 4.1) software implementing a joint Estonian and Finnish reference panel (93).

We tested the associations of rs7209826 and rs188810925 with fracture (ICD-10 codes S52.5, S82.6, S22.3, S42.2, S52.6, S22.4, S42.0, S82.8, 572.0, S71.1, S52.1, S32.0, S52.0, S82.4, S82.3, S72.1, S82.5, S22.0, S82.0, S82.7, S82.2, S82.0, S32.5, S32.2, S42.3, S52.2, S52.3, S42.4, S72.3, S52.8, S22.2, S52.4, S42.1, S72.2, S72.4, S32.1, S22.1, S12.2, S32.7, S32.8, S32.4, S82.9, S32.3, S52.9, S12.1, S42.8, S12.7, S72.8, S42.7, S72.9, S22.5, S72.7, S12.0, and S42.9), osteoporosis (ICD-10 codes M80 and M81), the prevalent coronary artery disease (ICD-10 codes I20, I21, I22, I23, I24, and I25), infarction (ICD-10 codes I21, I22, and I25.2), and SBP (measured at participant recruitment). For all of the outcome variables with an exception of cardiovascular disease, we considered prevalent case statuses reported at recruitment and individuals with records of diagnosis codes reported in the electronic registries before the recruitment. For the outcome of cardiovascular disease, we considered only prevalent cases reported at recruitment.

Analyses in EGCUT were approved by the Ethics Review Committee of the University of Tartu (243 T-12).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A