To predict metabolic capabilities of microbial taxa identified by 16S analysis we used a subsystem-based approach implemented in microbial community SEED (mcSEED), an application of the SEED genomic platform (Overbeek et al., 2014) that have been used to capture, analyze and extend pathways, enzymes, and transporters involved in specific subsystems (metabolic pathways) in the reference set of 2,662 genomes representing 690 microbial species from human gut. The subsystem-based approach implement functional gene annotation and prediction using three comparative genomic techniques: (i) homology-based methods; (ii) genome context analysis; (iii) co-regulation by the same regulon (Wattam et al., 2014).
The collection of curated metabolic subsystems analyzed for this work includes a subset of catabolic pathways for various sugars ( mono-, di-, and oligosaccharides) and amino acids, as well as two SCFA synthesis pathways (propionate and butyrate). The analyzed microbial genomes were imported to mcSEED from the PATRIC genomic database (Ravcheev et al., 2013). The metabolic subsystems were developed based on previously published genomic studies of sugar and amino acid metabolism in various bacterial taxa (Gu et al., 2010; Ravcheev et al., 2013; Rodionova et al., 2013; Arzamasov et al., 2018; Bouvier et al., 2018) and the studies of phylogenetic distribution of bacterial pathways for production of butyrate (Vital et al., 2014) and propionate (Reichardt et al., 2014). As result, each reference genome in each analyzed subsystem was assigned a binary (“1” or “0”) phenotype reflecting the presence/absence of a complete sugar/amino acid catabolic or SCFA synthesis pathway. Binary phenotypes in reference genomes were summarized in the form of a Binary Phenotype Matrix (BPM). In addition to catabolic enzymes, the sugar utilization subsystems also included sugar-specific uptake transporters, thus the assigned sugar utilization capability required the presence of both catabolic pathway and uptake transporter (Centers for Disease Control, 1986). The distribution of carbohydrate active enzymes (CAZymes) including 229 families of glycosyl hydrolases and 57 families of pectate lyases in the analyzed reference genomes was obtained using dbCAN2 tool (Zhang et al., 2018). The obtained CAZyme family distribution was made converted to a binary GH-BPM matrix of appearance of each GH family across the analyzed reference genomes. The obtained BPM and GH-BPM for metabolic phenotypes and CAZyme family distributions for 2,662 reference genomes provided as a part of the Phenotype Profiler tool were used to calculate a community phenotype matrix for all mapped taxa obtained from 16S analysis as previously described (Rodionov et al., 2019). Community phenotype index (CPI) for each 16S sample was further calculated as a sum of respective community phenotype matrix values of each taxa multiplied by their relative abundances. CPI provides a fractional representation of cells in the community possessing a specific metabolic pathway or CAZyme family (on the scale 0–100%). The Phenotype Profiler tool for CPI calculation was provided by PhenoBiome Inc. (San Francisco, CA, United States).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.