iSet for analysis of stratified cohorts

FC Francesco Paolo Casale
DH Danilo Horta
BR Barbara Rakitsch
OS Oliver Stegle
request Request a Protocol
ask Ask a question
Favorite

To study performance of iSet when considering interaction analyses in stratified cohorts, we considered simulation experiments analogous to those for fully observed designs. We generated a synthetic cohort of 2,000 Europeans where each individual was phenotyped in only in one of two contexts. For each individual, the phenotyped context was independently selected using a draw from a Bernoulli distribution (symmetric, 50% success rate). Statistical calibration and power simulations were performed analogously to the approach used for fully observed designs. Population structure was accounted for using the first ten principal components of the realized relatedness matrix as fixed effect covariates. We did not consider tests for heterogeneity-GxC, as differential tagging of causal variants could potentially result in spurious heterogeneity-GxC signals, and hence additional controls would be required. However, in principle the test applies to stratified populations.

We compared iSet to the single-variant interaction tests as in [10] (mtLMM-int) and the gene-environment set association test (GESAT) [13]. The latter approach is representative for a family of closely related set tests that can only be applied to test for interaction effects in stratified populations (See S1 Text). As an additional comparison, we extended the single-variant interaction test in [10] for stratified cohorts. To the best of our knowledge there are currently no implementations of mtLMM-int that can be applied to such designs. The models are available within the LIMIX package [48] (for full details see S1 Text). GESAT was run using the function GESAT of the package iSKAT version 1.2. Both iSet and GESAT were applied on identically processed standardized variants.

We performed a genotype-sex interaction analysis of four blood lipid phenotypes (C-reactive protein (CRP), triglycerides (TRIGL), LDL and HDL cholesterol levels) measured in 5,256 unrelated individuals from the NFBC1966 cohort [20] (phs000276.v1.p1). Following [11, 12], we regressed out major covariates, following a quantile-normalization of each trait individually. In order to correct for population structure, we considered the first ten principal components of the realized relatedness matrix as fixed effect covariates.

We applied mtSet and iSet to 318,653 genome-wide variants with an allele frequency of at least 1% using a sliding-window approach (100kb regions, 50kb step size; resulting in 52,819 windows overall; (S4 Fig). For comparison we considered the single-variant interaction test [10], GESAT [13] and stSet [8], a univariate set test without stratification by sex. For each window we considered 100 permutations for mtSet and stSet and 100 parametric bootstraps for iSet and combined the obtained null LLRs across windows and traits (for a total of 21,127,600 null LLRs per test) to obtain empirical P values. Significance of the considered statistical tests was assessed at FWER = 10%. Summary results from all considered methods are reported in S6 Table.

Genotype data from NFBC1966.phs000276.v1.p1 were imputed using the 1000 Genomes Project phase 3 reference panel as described in the following. After aligning the dataset to the reference panel, we ran shapeit v2.r727 [50] with recommended parameters on each chromosome to produce haplotype estimates. We used impute2 v2.3.2 [51] with recommended parameters to impute untyped genotypes. Imputation was performed on chunks of approximately 5Mb. We merged region with less than 200 SNPs and avoided considering regions that span the centromere.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A