Advanced Search
Published: Jun 20, 2022 DOI: 10.21769/BioProtoc.4446 Views: 2620
Edited by: Jinfeng Chen Reviewed by: Yizhou Wang
Abstract
In RNA-seq data analysis, functional enrichment analysis on genes has become a routine. Many enrichment analysis software and web-applications have emerged. However, gene annotation information is only easily accessible for the most well-studied organisms, such as human and mouse, but is lacking for some plant species. With poor gene annotation information, performing a functional enrichment analysis is challenging. As such, I use rice, a mode plant organism, as an example to show how to obtain comprehensive Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation for the enrichment analysis. I obtain the gene annotation information from two sources, 1. rice public annotation databases, including RAP-DB and OryzaBase; and 2. a R package containing gene annotation information of various species,i.e., AnnotationHub. I utilize clusterProfiler R package for the enrichment calculation and result visualization. This protocol can be directly used for GO/KEGG enrichment analysis on gene lists from rice, and can also be used as a reference for similar analysis on other plant species.
Keywords: KEGGBackground
RNA-seq data analysis has been streamlined, and functional enrichment analysis is a critical step to provide biological insights into the results. Enrichment analysis, or over-representative analysis, is to examine whether a gene ontology or a biological pathway is enriched in the target gene list more than is expected by chance. Many tools were developed to contain both annotation files and enrichment test functions to streamline this process. However, some plant species may still lack of gene annotation information, which could be an obstacle for the functional enrichment analysis. For instance, only 20 GO annotation databases were available under OrgDb from Bioconductor, where only one is the plant species Arabidopsis. In this protocol, I focus on performing functional enrichment analysis on genes of rice, a model organism for the grass family, using one of the most commonly used enrichment analysis R software clusterProfiler (Yuet al., 2012). I provide a step-by-step instruction using annotation information obtained from two different ways. The scripts are mainly the R scripts, with some Bash command lines for curating a GO annotation file.
Software
clusterProfiler (Yu et al. , 2012; v3.16.1; https://guangchuangyu.github.io/software/clusterProfiler/documentation/)
GO.db (Carlson et al. , 2019; v3.11.4; https://bioconductor.org/packages/release/data/annotation/html/GO.db.html)
AnnotationHub (Morgan et al. , 2021, v 2.20.1, https://bioconductor.org/packages/release/bioc/vignettes/AnnotationHub/inst/doc/AnnotationHub.html)
dplyr (R package, v1.0.7)
data.table (R package, v1.14.0)
ggplot2 (R package, v3.3.5)
Input data:
Target gene list (genes.txt), background gene list (bkgd.txt, optional but recommended). The gene IDs are the RAP IDs in this protocol, e.g ., Os01g0102500, Os01g0106300.
The gene annotation file obtained from The Rice Annotation Project (RAP) Database (RAP-DB), including the GO annotation information and RAP gene ID to transcript ID conversion information. This file is a large data table, where each row is an individual transcript ID, and each column is a gene annotation information, and "GO" is the column that contains the GO annotations, which are extracted into self-curated annotation files.
https://rapdb.dna.affrc.go.jp/download/archive/irgsp1/IRGSP-1.0_representative_annotation_2021-11-11.tsv.gz
The gene annotation file from the OryzaBase website. This file is also a large data table, where each row is for a "Trait Gene ID", with annotations of "RAP ID" and "Gene Ontology", which are used for generating self-curated annotation files. https://shigen.nig.ac.jp/rice/oryzabase/download/gene
RAP ID to Entrez ID conversion table from the He Lab at Fujian Agriculture and Forestry University, China. http://bioinformatics.fafu.edu.cn/riceidtable/
Procedure
Category
Plant Science > Plant molecular biology
Plant Science
Systems Biology > Transcriptomics > RNA-seq
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Share
Bluesky
X
Copy link