Functional annotation and gene-mapping using FUMA

MN Mats Nagel
KW Kyoko Watanabe
SS Sven Stringer
DP Danielle Posthuma
SS Sophie van der Sluis
request Request a Protocol
ask Ask a question
Favorite

The FUMA GWAS platform (http://fuma.ctglab.nl/20) uses GWAS summary statistics to functionally map, annotate, prioritize, visualize, and interpret GWAS results. We used the summary statistics from GWA meta-analyses on the 12 individual items and the neuroticism sum-score as input for FUMA.

FUMA first defined independent significant SNPs which have a genome-wide significant P value (5 × 10−8) and are independent at r2 < 0.6. Subsequently, lead SNPs were defined by retaining those independent significant SNPs that were independent from each other at r2 < 0.1. Next, risk loci were defined by merging physically overlapping lead SNPs or lead SNPs whose LD blocks were closer than 250 kb apart. A consequence of this definition of risk loci is that the same locus may be discovered for different phenotypes included in the study, while the lead SNPs are different.

All SNPs in LD ( > 0.6) with one of the independent significant SNPs were used in annotation. Functional consequences were obtained by performing ANNOVAR gene-based annotation using Ensembl genes. In addition, potential regulatory functions are indicated by the RegulomeDB score43 (with lower scores indicating a higher probability of having a regulatory function) and by 15-core chromatin states predicted by ChromHMM44 for 127 tissue/cell types45.

All SNPs in genomic risk loci that were GWS, or in LD ( > 0.6) with one of the independent GWS SNPs were mapped to genes in FUMA20 using either of three strategies.

The first strategy we applied, positional mapping, was used to map SNPs to genes based on the physical distances (i.e., within 10 kb window) from known protein coding genes in the human reference assembly (GRCh37/hg19).

The second strategy, eQTL mapping, is used to link SNPs to genes with which these SNPs show a significant eQTL association (i.e., allelic variation at the SNP affects the expression of that gene). This strategy is based on information from 3 data repositories (GTEx, Blood eQTL browser, and BIOS QTL browser), and uses cis-eQTLs, which can map SNPs to genes that lie up to 1 Mb apart. We applied a false discovery rate (FDR) of 0.05 to define significant eQTL associations.

Finally, using chromatin interaction, SNPs were mapped to genes based on a significant chromatin interaction between a genomic region in a risk locus and promoter regions of genes (250 bp up- and 500 bp downstream of transcription start site (TSS)). Unlike eQTL mapping, chromatin interaction mapping has no distance boundary and can involve long-range interactions. Currently, Hi-C data of 14 tissue types are included in FUMA46. Generally, chromatin interactions are defined in a certain resolution (40 kb in this case) such that interacting regions may span multiple genes. All SNPs within these regions would be mapped by this method to genes in the corresponding interaction region. To further prioritize candidate genes from chromatin interaction mapping, we integrated predicted enhancers and promoters in 111 tissue/cell types from the Roadmap Epigenomics Project45; chromatin interactions are selected in which one region involved in the interaction overlaps with predicted enhancers and the other region overlaps with predicted promoters in 250 bp upstream and 500 bp downstream of the TSS site of a gene. A FDR of 1 × 10−5 was applied to define significant interactions.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A