This approach uses explicit logic-based ‘pathway rules’ to assign binary phenotypes. These rules combine (1) expert curators’ knowledge regarding the gene composition of various metabolic pathway variants contained in the mcSEED database with (2) a decision tree method to identify patterns of gene representation in reference genomes corresponding to an intact functional pathway variant (and a respective binary phenotype value denoted as ‘1’). A total of 106 functional pathway-specific decision trees was generated (Rpart81, v.4.1.15), where the presence or absence of a particular phenotype was the response variable, and the presence or absence of functional roles (encoded by genes) in each reference pathway were predictor variables. The resulting pathway rules were formally encoded into a custom R script that enabled us to process MAG gene data and assign values (1 or 0) for each of the 106 functional metabolic pathways.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.