Pathways were extracted from MSigDB v5.2 canonical pathways (CP) and Gene Ontology (GO) datasets; MSigDB [28] is distinguished for having the largest collection of gene sets, derived from diverse gene set sources. Only pathways containing 10–1000 genes were included, yielding a total of 7111 pathways (1309 CP, 5802 GO) for the analysis. We compiled gene sets in our association data using a 35kb upstream and 10kb downstream window to include gene regulatory regions and MAF ≥ 0.05. The genes were encoded by ENSEMBL identifiers (release 75, genome assembly h19). Pathways were assigned competitive p-values using MAGMA v1.05 [29] which assesses whether a pathway is more associated with a trait than other pathways, and takes into account linkage disequilibrium (LD). The reference data used for LD was the Southern Han Chinese subset (CHS) of 1000 genomes phase III data [30]. The gene and pathway p-values were adjusted using Benjamini-Hochberg FDR procedure [31] to obtain q-values. In silico tissue specific expression of the top genes from the association and VEGAS2 analyses, was examined using the freely available online database, Genotype-Tissue Expression (GTEx) Portal (http://www.gtexportal.org/).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.