The methods were applied by the authors

Lei Lei; Ying Wang; Qiong Xue; Jianhua Tong; Cheng-Mao Zhou; Jian-Jun Yang

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

The methods were applied by the authors

LL Lei Lei

YW Ying Wang

QX Qiong Xue

JT Jianhua Tong

CZ Cheng-Mao Zhou

JY Jian-Jun Yang

This method is extracted from research article: PeerJ, Feb 2020

A comparative study of machine learning algorithms for predicting acute kidney injury after liver cancer resection

DOI: 10.7717/peerj.8583

Request a Protocol

Ask a question

Favorite

The Python programming language (Python Software Foundation, version 3.6) was used for our analysis. The Scikit-learn package (Scikit Learning (https://github.com/scikit-learn/scikit-learn)) (Huang et al., 2018; Teles et al., 2016) was used for machine learning. This included forest, gbm, decision tree and Gbdt. The programming analysis code used in our research is shown in Appendix S1.

The sample was randomly divided into a training set and a test set, at a ratio of 7:3. The coefficients for the machine learning technique were trained with the training set and tested with the test set. Evaluation and comparison were completed with the prediction accuracy of a model constructed by machine learning and the area under the receiver operating characteristic curve. We also compared MSE, accuracy and recall rate. Missing data were estimated through multiple imputations.

F1-Measure evaluation indicators are often used in information retrieval and natural language processing. They constitute a comprehensive evaluation index based on precision rate and recall rate, and their specific definitions are as follows:

where R is the recall and P is the precision.

Precision rate indicates the proportion of correctly classified cases among the sample.

Accuracy rate indicates the number of paired cases divided by the total number of cases.

Recall rate indicates how many positive cases in the sample were predicted correctly.

In machine learning, a random forest (forest) is a classifier that includes multiple decision trees. The categories of its output are determined by the modes of categories output by individual trees.

The LightGBM (gbm) algorithm is a lifting machine learning algorithm. It is a fast, distributed and high-performing gradient lifting framework based on a decision tree algorithm. It can sort, classify, run regressions, and perform many other machine learning tasks.

The construction of a decision tree model has two steps: induction and pruning. Induction is the step of constructing a decision tree (tr) by setting all hierarchical decision boundaries based on data at hand. However, the tree model is subject to severe over-fitting due to the nature of the training decision tree, and this is when pruning is required. Pruning is the process of removing unnecessary branch structures from the decision tree, simplifying the process of overcoming over-fitting and making it easier to interpret.

Elevation is a machine learning technique that can be used for regression and classification problems. It produces a weak prediction model (like a decision tree) at each step and weights it into the total model. If the weak prediction model of each step generates consistent loss function gradient direction, then it is called gradient boosting (Gbdt).

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol