All association analyses were carried out using the R-package seqMeta [50]. Each study ran the “prepScores” function and adjusted their analyses for age, gender, body mass index (BMI), height, principal components, and study-specific covariates when appropriate (details in Additional file 1: Table S1). The output of this function is an R “list” object (“a prepScores object”), stored in an .RData file, where each element corresponds to a gene, and contains the scores and MAFs for variants, as well as a matrix of the covariance between the scores at all pairs of SNPs within a gene. All studies performed both gender combined and separated analyses, in addition to separation by ancestry. Using the prepScores objects from each study, we performed meta-analyses using the “singlesnpMeta()” for single variant meta-analyses, and the “burdenMeta” and “skatMeta()” functions of SeqMeta. Coefficients and standard errors from seqMeta can be interpreted as a “one-step” approximation to the maximum likelihood estimates. Ancestry groups were analyzed both separate and combined at the meta-analysis level.
For single variant meta-analyses, we included all variants with a MAC ≥ 10 in order to have well-calibrated type I error rates [51]. Statistical significance was defined using Bonferroni corrections. For single variants, maximally 162,199 variants were included in five separate analyses after filtering for MAC: European and African ancestry separated and combined (n = 3); and sex-stratified analyses (n = 2), resulting in a Bonferroni corrected P value of α=0.05 / 162,199 variants / 5 analyses = 6.17 × 10−8.
Suggestive sexually dimorphic associations were identified by performing sex-stratified meta-analyses, totaling 39,907 women and 31,702 men, including only from cohorts that had both male and female samples. Variants were deemed to be suggestive sex-specific when reaching below a P value threshold of exome-wide significance (P < 6.17 × 10−8) in one sex and above nominal significance in the other (P > 0.05).
For gene-based tests, also performed using seqMeta using the “prepScores” objects from individual cohorts, we assigned variants to genes by annotating all variants on the Exome Chip using ANNOVAR [52] following RefSeq [53] gene definitions mapped to human genome build 37 (hg19). In the collapsed variant tests, we included only variants with MAF < 1% and included only genes for which two or more variants were present (n = 16,085). We performed both SKAT [54] and T1 burden [55] tests, for three different functional sets of variants limited to the following: (I) all variants; (II) missense, nonsense, splice, and indel variants; (III) “damaging”: the same variants as in group II, except for missense only including those that are predicted to be damaging by at least two out of four functional prediction algorithms (Polyphen2 [56], SIFT [57], Mutation Taster [58], and LRT [59]). For the gene-based tests, we used a Bonferroni corrected P value significance threshold of α=0.05 / 16,085 genes / 2 different tests / 3 functional variant classes = 5.18 × 10−7.
We define a physically independent locus as the genomic region that contains variants within 250 kb on either side of LD-independent lead SNPs (exome-wide significant variants with r2 < 0.1), where LD calculations were based on European ancestry. Following this definition, in certain cases LD-independent lead variants are present in overlapping regions, complicating the definition and reporting of associated genetic loci and harbored genes. Therefore, we annealed loci if LD-independent exome-wide significant variants were < 250 kb from each other. Where lead SNPs from previous analyses were not contained in these regions, we considered these as novel. LD calculations were performed on the Illumina Exome Chip genotype data from the TwinsUK cohort [60] (n = 1194), using PLINK 1.9 [61].
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.