Machine learning

Jack Cheng; Hsin-Ping Liu; Wei-Yong Lin; Fuu-Jen Tsai

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Machine learning

JC Jack Cheng

HL Hsin-Ping Liu

WL Wei-Yong Lin

FT Fuu-Jen Tsai

This method is extracted from research article: Sci Rep, Jul 2021

Machine learning compensates fold-change method and highlights oxidative phosphorylation in the brain transcriptome of Alzheimer’s disease

DOI: 10.1038/s41598-021-93085-z

Request a Protocol

Ask a question

Favorite

RapidMiner Studio version 9.5 (WIN64 platform) was registered to Jack Cheng and was executed under the Windows 10 operating system with Intel Core i3-3220 CPU and 16 GB RAM. In addition to the samples' age and sex, the 9969 profiled genes were assigned as the regular attributes (potential contributing factors to be analyzed in modeling operator) in the modeling. The disease status (1 = AD; 0 = non-AD CTRL) was assigned as the Label attribute (the predicted class in modeling operator). The sample ID was assigned as the ID attribute (assigning the identity of the sample). The input matrix is supplied as Supplementary File ¹.

Seven predictive operators or combinations of RapidMiner Studio operators were used to establish predictive models from the input matrix and assign a weight to each attribute. They were (1) AdaBoost + Decision Tree, (2) AdaBoost + Rule Induction, (3) AdaBoost + Decision Stump, (4) Generalized Linear Model, (5) Logistic Regression, (6) Gradient Boosted Trees, and (7) Random Forest + Weight by Tree Importance. The parameters of these operators are listed in the Parameters sheet of Supplementary File ². Notably, in the Random Forest model, the number of trees was 500, and the depth of split was set to '-1', which means the maximal depth parameter puts no bound on the depth of the trees. Moreover, the Generalized Linear Model is a regularized GLM, and the elastic net penalty was used for parameter regularization. Other operators under the category Models / Predictive were abandoned in this study due to the reasons listed in the Models sheet of Supplementary File ².

The model's performance was estimated by cross-validation of models, which contains two subprocesses: a training subprocess and a testing subprocess. The training subprocess produces a trained model to be applied to the testing subprocess for the performance evaluation. In this study, the samples were randomly divided into ten subsets, with an equal number of samples. Each of the ten subsets was iterationaly used in the testing subprocess to evaluate the trained model from the other nine subsets. The convergence of each model's iteration was recorded and summarized in Supplementary File ³, which describes how genes were aggregated from these iterations. The performance of a model can be evaluated by its accuracy, precision, and recall, where accuracy = (TP + TN)/(TP + FP + FN + TN), precision = TP/(TP + FP), recall = TP/(TP + FN), T = true, F = false, P = positive, and N = negative. The setup diagrams of the seven predictive models are illustrated in Supplementary File ⁴.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol