Statistical methods

BP Bram P. Prins
TM Timothy J. Mead
JB Jennifer A. Brody
GS Gardar Sveinbjornsson
IN Ioanna Ntalla
NB Nathan A. Bihlmeyer
MB Marten van den Berg
JB Jette Bork-Jensen
SC Stefania Cappellani
SD Stefan Van Duijvenboden
NK Nikolai T. Klena
GG George C. Gabriel
XL Xiaoqin Liu
CG Cagri Gulec
NG Niels Grarup
JH Jeffrey Haessler
LH Leanne M. Hall
AI Annamaria Iorio
AI Aaron Isaacs
RL Ruifang Li-Gao
HL Honghuang Lin
CL Ching-Ti Liu
LL Leo-Pekka Lyytikäinen
JM Jonathan Marten
HM Hao Mei
MM Martina Müller-Nurasyid
MO Michele Orini
SP Sandosh Padmanabhan
FR Farid Radmanesh
JR Julia Ramirez
AR Antonietta Robino
MS Molly Schwartz
JS Jessica van Setten
AS Albert V. Smith
NV Niek Verweij
HW Helen R. Warren
SW Stefan Weiss
AA Alvaro Alonso
DA David O. Arnar
MB Michiel L. Bots
RB Rudolf A. de Boer
AD Anna F. Dominiczak
ME Mark Eijgelsheim
PE Patrick T. Ellinor
XG Xiuqing Guo
SF Stephan B. Felix
TH Tamara B. Harris
CH Caroline Hayward
SH Susan R. Heckbert
PH Paul L. Huang
request Request a Protocol
ask Ask a question
Favorite

All association analyses were carried out using the R-package seqMeta [50]. Each study ran the “prepScores” function and adjusted their analyses for age, gender, body mass index (BMI), height, principal components, and study-specific covariates when appropriate (details in Additional file 1: Table S1). The output of this function is an R “list” object (“a prepScores object”), stored in an .RData file, where each element corresponds to a gene, and contains the scores and MAFs for variants, as well as a matrix of the covariance between the scores at all pairs of SNPs within a gene. All studies performed both gender combined and separated analyses, in addition to separation by ancestry. Using the prepScores objects from each study, we performed meta-analyses using the “singlesnpMeta()” for single variant meta-analyses, and the “burdenMeta” and “skatMeta()” functions of SeqMeta. Coefficients and standard errors from seqMeta can be interpreted as a “one-step” approximation to the maximum likelihood estimates. Ancestry groups were analyzed both separate and combined at the meta-analysis level.

For single variant meta-analyses, we included all variants with a MAC ≥ 10 in order to have well-calibrated type I error rates [51]. Statistical significance was defined using Bonferroni corrections. For single variants, maximally 162,199 variants were included in five separate analyses after filtering for MAC: European and African ancestry separated and combined (n = 3); and sex-stratified analyses (n = 2), resulting in a Bonferroni corrected P value of α=0.05 / 162,199 variants / 5 analyses = 6.17 × 10−8.

Suggestive sexually dimorphic associations were identified by performing sex-stratified meta-analyses, totaling 39,907 women and 31,702 men, including only from cohorts that had both male and female samples. Variants were deemed to be suggestive sex-specific when reaching below a P value threshold of exome-wide significance (P < 6.17 × 10−8) in one sex and above nominal significance in the other (P > 0.05).

For gene-based tests, also performed using seqMeta using the “prepScores” objects from individual cohorts, we assigned variants to genes by annotating all variants on the Exome Chip using ANNOVAR [52] following RefSeq [53] gene definitions mapped to human genome build 37 (hg19). In the collapsed variant tests, we included only variants with MAF < 1% and included only genes for which two or more variants were present (n = 16,085). We performed both SKAT [54] and T1 burden [55] tests, for three different functional sets of variants limited to the following: (I) all variants; (II) missense, nonsense, splice, and indel variants; (III) “damaging”: the same variants as in group II, except for missense only including those that are predicted to be damaging by at least two out of four functional prediction algorithms (Polyphen2 [56], SIFT [57], Mutation Taster [58], and LRT [59]). For the gene-based tests, we used a Bonferroni corrected P value significance threshold of α=0.05 / 16,085 genes / 2 different tests / 3 functional variant classes = 5.18 × 10−7.

We define a physically independent locus as the genomic region that contains variants within 250 kb on either side of LD-independent lead SNPs (exome-wide significant variants with r2 < 0.1), where LD calculations were based on European ancestry. Following this definition, in certain cases LD-independent lead variants are present in overlapping regions, complicating the definition and reporting of associated genetic loci and harbored genes. Therefore, we annealed loci if LD-independent exome-wide significant variants were < 250 kb from each other. Where lead SNPs from previous analyses were not contained in these regions, we considered these as novel. LD calculations were performed on the Illumina Exome Chip genotype data from the TwinsUK cohort [60] (n = 1194), using PLINK 1.9 [61].

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A