Pathway and cluster analysis of gene expression

BE Brian J. Eddie
AM Anthony P. Malanoski
EO Elizabeth L. Onderko
DP Daniel A. Phillips
SG Sarah M. Glaven
request Request a Protocol
ask Ask a question
Favorite

The R package GAGE 2.26 was used to perform generally applicable gene-set enrichment (GAGE) analysis [28] on subsets of genes identified as having functional or regulatory similarity by five different methods, described below. Pathways and modules identified in the Marinobacter atlanticus CP1 genome by the Kyoto Encyclopedia of Genes and Genomes (KEGG) were obtained using the R package KEGGREST 1.16 [66]. In the instance of the KEGG secretory system gene set, we manually curated a set of only the genes associated with the T6SS. Three additional genes (ACP86_11615–ACP86_11625) were also included because they appeared to be co-expressed in an operon with ACP86_16610.

The Gene Ontology (GO) terms assigned in the Uniprot database for all predicted proteins from the assembly were downloaded and mapped to the corresponding gene from the output of the htseq-count program to facilitate functional analysis.

Possible regulatory networks were identified primarily by examining the RegPrecise motif database that had identified transcription factor regulons for a group of Oceanospirillales and Alteromonadales species that are relatively closely related to M. atlanticus [67]. Motifs from the database were searched against 350 bases upstream of the start of a translated protein and 50 bases downstream using FIMO from the MEME software suite [68]. Proteins encoded downstream of putative regulatory regions were compared to the proteins in the RegPrecise database regulon to determine the likelihood that the transcription factor exists in this organism and was regulating a similar process.

AntiSMASH bacterial version was used to identify secondary metabolite biosynthetic pathways, using the default settings (detection strictness set to relaxed, KnownClusterBlast, SubClusterBlast, and ActiveSiteFinder on) [26].

Weighted correlation network analysis was performed on the samples using the R package WCGNA 1.63 [27]. The regularized log transforms of the normalized read counts from the DESeq 2 analysis were used as the input for the analysis. A scale-free topology fit determined the best soft-thresholding power to be 6. The dynamic tree cutting method from the R package generated an initial set of modules that were then merged into a final set of modules. The analysis resulted in grouping 1308 genes into 14 different sets. GAGE usually does not provide any significant finding for sets that are very small or very large. For the largest gene set generated, we selected a smaller gene set by filtering for only the genes with the highest potential association to the cluster center, resulting gene sets wgcna_11 and wgcna_11b.

We identified a final gene set after an initial examination of the differential expression. Genes annotated with controlling copper were noted as having a similar expression pattern. Previously published experimental work identified an operon for copper homeostasis in the closely related species Marinobacter aquaeolei 617 and was characterized in depth as such for that organism [69]. Prompted by this, we used BLAST search to find genes with high sequence homology in our organism. All the genes had a close match in M. atlanticus, with most genes still located in one operon. Five genes (ACP86_11500, ACP86_11505, and ACP86_11815- ACP86_11825) were in their own operon in M. atlanticus. All the genes were treated as a single gene set for GAGE analysis and it remains a putative regulon. Genes for copper homeostasis flank genes encode a complex I-like structure and appear to be co-regulated, therefore were included in the ref_cop gene set.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A