Classification and validation

KC Kunpeng Cai
HX Hong Xu
HG Hao Guan
WZ Wanlin Zhu
JJ Jiyang Jiang
YC Yue Cui
JZ Jicong Zhang
TL Tao Liu
WW Wei Wen
request Request a Protocol
ask Ask a question
Favorite

To validate whether different classifiers would result in significant differences, we evaluated the performances of three different classification algorithms including Naïve Bayes, Logistic Regression and SVM, which were available in the Waikato Environment for Knowledge Analysis (WEKA) package [45]. After an optimal feature subset was selected, Monte Carlo cross-validation was used to evaluate the methods [46]. Monte Carlo cross-validation (MCCV) works differently from K-folds cross-validation in that it generates more possible partitions which are done independently for each run. MCCV splits the dataset into training set and test set by sampling without replacement. In our experiment, the split ratio was set as 7:3, the 70% of the dataset were selected to form the training set. This process was then repeated 10 times. During each run, the classifier was trained with the training set and was then validated using the test set. For each classifier, the final classification results were the average of these 10 independent experiments. An overview of the classification procedure is shown in Fig 1.

The results of the different classifiers were compared using the performance metrics including accuracy, sensitivity and specificity. Furthermore, we plotted receiver operating characteristic (ROC) curves, and the areas under the ROC curves (AUC) were also calculated by averaging the trapezoidal approximations for the curve created by the True Positive Rate (TPR) and False Positive Rate (FPR).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A