Functional enrichment analyses

NW Nicole Welch
SS Shashi Shekhar Singh
RM Ryan Musich
MM M. Shahid Mansuri
AB Annette Bellar
SM Saurabh Mishra
AC Aruna K. Chelluboyina
JS Jinendiran Sekar
AA Amy H. Attaway
LL Ling Li
BW Belinda Willard
TH Troy A. Hornberger
SD Srinivasan Dasarathy
ask Ask a question
Favorite

To account for differences in statistical analyses across the mouse and human exercise phosphoproteomics datasets, comparative functional enrichment analyses were performed using two methods of feature selection and dimensionality reduction: first, using greatest expression differences in phosphorylation as compared to controls and, next, using the p-value cutoffs specified in each published dataset. To avoid reliance only on one pathway algorithm or gene list (Liberzon et al., 2011, 2015), we used GO (Harris et al., 2004), KEGG (Kanehisa et al., 2017; Kanehisa and Goto, 2000), and Reactome (Fabregat et al., 2018) databases.

IPA (QIAGEN Inc., https://www.qiagenbio-informatics.com/products/ingenuity-pathway-analysis), DAVID (Huang da et al., 2009a, 2009b), Perseus (Cox and Mann, 2012; Tyanova et al., 2016), g:Profiler (https://biit.cs.ut.ee/gprofiler/), and STRING were used for functional enrichment analyses. Given the differences in algorithms and gene lists for different approaches, we used a combination of tools including IPA (Kramer et al., 2014) that provided pathway enrichments, DAVID (Huang da et al., 2009b) to obtain an annotated gene list and functional enrichment using gene datasets, and Perseus (Tyanova et al., 2016) to determine if expression values of individual phosphoproteins have a preference to be systematically larger or smaller than the global distribution of expression values (Cox and Mann, 2012). These varied approaches allowed for a broad exploration of pathway and biological process enrichment discovery of these integrated datasets.

DAVID functional enrichment analysis was performed for the complete hyperammonemia phosphoproteomics datasets using a foreground of regulated (FDR <0.05) sites and using a background of unregulated sites (FDR≥0.05). DAVID functional enrichment analysis was performed for the phosphoproteomics data subsets of the ‘6hAmAc Only’ sites, the ‘24hAmAc Only’ sites, and the hyperammonemic clusters and a background using the entire phosphoproteomics data (Huang da et al., 2009a; 2009b).

Canonical pathways shown in figures were filtered for relevance and ordered based on a-log(p-value) ≥ 1.3. Exercise and hyperammonemia data in IPA were analyzed using the dataset phosphorylated proteins as the background and filtered by log fold change for the foreground proteins in order to understand what pathways were enriched using the greatest change in expression in each group. Since the statistical approaches and number of samples in each of the published datasets were variable, we performed functional enrichments in IPA on the full datasets by defining the foreground of differentially expressed proteins using 2 approaches: 1) Absolute value of log2ratio change cutoffs was adjusted per dataset (6hAmAc and 24hAmAc>|2.5|; Nighttime, Daytime>|1|; MIC, Treadmill (65% max.), Human >|0.5|) to achieve 500-800 proteins to normalize for differences in machines and batch effect, 2) A uniform significance cutoff at the DEpP level of q<0.05 in each dataset. The background against which enrichment was identified for each hyperammonemia and exercise datasets was the full dataset of phosphoproteins identified in each project. For the data subsets, i.e. ‘AmAc’ only, ‘Exercise only’, and ‘Shared AmAc and Exercise,’ the foreground data used were the DEpP which were analyzed against the background of all phosphoproteins within the hyperammonemia datasets (‘AmAc only’), the exercise datasets (‘Exercise only’), or both (‘Shared AmAc and Exercise’ sites).

Perseus 1D analysis was performed for the hyperammonemia datasets using all phosphoproteins without any significance cutoff for protein at 6hAmAc and 24hAmAc and the default settings for pathway significance (Benjamini-Hotchberg method, FDR<0.02, removal of duplicate phosphoproteins if more than one site was phosphorylated on the protein) in the hyperammonemia datasets.

g:Profiler analysis was only performed for subsets of shared hyperammonemia and exercise DEpP and were performed using the gene lists of interest against a homo sapiens genome background.

All canonical pathways that had representation within a dataset or subset are listed in supplementary tables that correspond to each figure that contains functional enrichment analyses.

(http://phomics.jensenlab.org/phospho_enrichment) analysis (Munk et al., 2016) was used to determine the functional enrichment of phospho-proteomics datasets as compared to phospho-proteomics background molecules that were not differentially expressed in the datasets of interest.

The STRING database was queried to showprotein-protein network interactions between proteins of interest. Connections between DEpP subsets were identified using the “multiple proteins by names/identifiers” search tab (https://stringdb.org/cgi/input?sessionId=bI6QSAkvXKc1&input_page_active_form=multiple_identifiers).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A