Machine Learning to Determine Potential Metabolic Engineering Targets

AR Albert Enrique Tafur Rangel
WR Wendy Ríos
DM Daisy Mejía
CO Carmen Ojeda
RC Ross Carlson
JR Jorge Mario Gómez Ramírez
AB Andrés Fernando González Barrios
request Request a Protocol
ask Ask a question
Favorite

Random forest models are supervised machine learning approaches, which have the advantage of giving a summary of the importance of each variable. This approach is based on a randomized variable selection process. An estimation of variable importance is provided by IncNodePurity, which measures the decrease in tree node purity that results from all splits of a given variable over all trees (Li et al., 2015). For interpretation purposes, this measure can be used to rank variables by the strength of their relation to the response variable (Li et al., 2015). A matrix of binary values was built from m mutant predicted and n reactions in the set of possible reactions to be knocked out. In this matrix, one represents the presence of one specific reaction to be deleted in the mutant and zero the absence in the combination of reactions to be deleted in the mutant. The matrix was partitioned into training and test sets; the training set was used to build a random forest model to predict succinic acid production, growth rate, or the growth rate Euclidean distance between the mutant and WT strains as response variables. For the training set, succinic acid production, growth rate variable response was initially predicted using FBA, and the growth rate Euclidean distance between the mutant and WT strains was predicted using MOMA. Next, the model performance was assessed using the testing set. Finally, we used the random forest to determine the importance of each target reaction over the three evaluated response variables.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A