We assessed the performance of the classifiers by tracing their overall accuracy as a function of the number of selected predictors. For SVM-RFE and random forests, for each training data set and its corresponding gene ranking by importance, we trained a series of classifiers using an increasing number of the top-ranked genes. We then evaluated the performance of those models using the corresponding test data and averaged the results across the 50 test data sets. For L 1-penalised logistic regression, gene selection was accomplished by varying the regularisation parameter λ. We used each of the 50 training data sets to construct a series of logistic regression models by varying λ. The corresponding test data sets were then used to estimate the accuracy of each model. For each λ value, we averaged the accuracies from the 50 test data sets as well as the number of genes selected.
The overall accuracy of a classifier is not very informative by itself because it does not tell us how well each cancer type is classified. Therefore, we also consider precision and recall. For multiclass classification, precision and recall of a cancer type i are defined as:
In all calculations, we computed the 95 % confidence interval of each quantity by multiplying the standard deviation of its estimate based on the 50 values by .
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.