The Python programming language (Python Software Foundation, version 3.6) was used for our analysis. The Scikit-learn package (Scikit Learning (https://github.com/scikit-learn/scikit-learn)) (Huang et al., 2018; Teles et al., 2016) was used for machine learning. This included forest, gbm, decision tree and Gbdt. The programming analysis code used in our research is shown in Appendix S1.
The sample was randomly divided into a training set and a test set, at a ratio of 7:3. The coefficients for the machine learning technique were trained with the training set and tested with the test set. Evaluation and comparison were completed with the prediction accuracy of a model constructed by machine learning and the area under the receiver operating characteristic curve. We also compared MSE, accuracy and recall rate. Missing data were estimated through multiple imputations.
F1-Measure evaluation indicators are often used in information retrieval and natural language processing. They constitute a comprehensive evaluation index based on precision rate and recall rate, and their specific definitions are as follows:
where R is the recall and P is the precision.
Precision rate indicates the proportion of correctly classified cases among the sample.
Accuracy rate indicates the number of paired cases divided by the total number of cases.
Recall rate indicates how many positive cases in the sample were predicted correctly.
In machine learning, a random forest (forest) is a classifier that includes multiple decision trees. The categories of its output are determined by the modes of categories output by individual trees.
The LightGBM (gbm) algorithm is a lifting machine learning algorithm. It is a fast, distributed and high-performing gradient lifting framework based on a decision tree algorithm. It can sort, classify, run regressions, and perform many other machine learning tasks.
The construction of a decision tree model has two steps: induction and pruning. Induction is the step of constructing a decision tree (tr) by setting all hierarchical decision boundaries based on data at hand. However, the tree model is subject to severe over-fitting due to the nature of the training decision tree, and this is when pruning is required. Pruning is the process of removing unnecessary branch structures from the decision tree, simplifying the process of overcoming over-fitting and making it easier to interpret.
Elevation is a machine learning technique that can be used for regression and classification problems. It produces a weak prediction model (like a decision tree) at each step and weights it into the total model. If the weak prediction model of each step generates consistent loss function gradient direction, then it is called gradient boosting (Gbdt).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.