Assessing classifier performance

Kee Pang Soh; Ewa Szczurek; Thomas Sakoparnig; Niko Beerenwinkel

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Assessing classifier performance

KS Kee Pang Soh

ES Ewa Szczurek

TS Thomas Sakoparnig

NB Niko Beerenwinkel

This method is extracted from research article: Genome Med, Nov 2017

Predicting cancer type from tumour DNA signatures

DOI: 10.1186/s13073-017-0493-2

Request a Protocol

Ask a question

Favorite

We assessed the performance of the classifiers by tracing their overall accuracy as a function of the number of selected predictors. For SVM-RFE and random forests, for each training data set and its corresponding gene ranking by importance, we trained a series of classifiers using an increasing number of the top-ranked genes. We then evaluated the performance of those models using the corresponding test data and averaged the results across the 50 test data sets. For L ₁-penalised logistic regression, gene selection was accomplished by varying the regularisation parameter λ. We used each of the 50 training data sets to construct a series of logistic regression models by varying λ. The corresponding test data sets were then used to estimate the accuracy of each model. For each λ value, we averaged the accuracies from the 50 test data sets as well as the number of genes selected.

The overall accuracy of a classifier is not very informative by itself because it does not tell us how well each cancer type is classified. Therefore, we also consider precision and recall. For multiclass classification, precision and recall of a cancer type i are defined as:

In all calculations, we computed the 95 % confidence interval of each quantity by multiplying the standard deviation of its estimate based on the 50 values by $\pm 1.96 / \sqrt{50}$ .

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol