Testing the ability of the classifier development method to work well for a dataset with very many, but few useful, features: predicting ten year survival for patients with prostate cancer

Joanna Roder; Carlos Oliveira; Lelia Net; Maxim Tsypin; Benjamin Linstid; Heinrich Roder

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Testing the ability of the classifier development method to work well for a dataset with very many, but few useful, features: predicting ten year survival for patients with prostate cancer

JR Joanna Roder

CO Carlos Oliveira

LN Lelia Net

MT Maxim Tsypin

BL Benjamin Linstid

HR Heinrich Roder

This method is extracted from research article: BMC Bioinformatics, Jun 2019

A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data

DOI: 10.1186/s12859-019-2922-2

Request a Protocol

Ask a question

Favorite

This investigation used the same datasets as above with the same goal to predict 10-year survival. Here we compared the DRC classifier approach with the RF. To mimic the situation of very many features, with only a few with utility for the problem in question, we added 10,000 randomly generated gaussian features (mean = 0 and standard deviation = 1) to both the development and validation data sets. For the DRC approach, rank-based kNNs were used as atomic classifiers, to avoid any problems with differences in scale between the original and randomly generated features. All kNN classifiers (k = 7) using the 10,343 features singly and pairs of features that passed single feature filtering were considered. Filtering was set as in the previous problem and resulted in around 25% of atomic classifiers considered passing filtering and 100,000 dropout iterations were used.

DRC and RF were generated using identical training/test set realizations for 9 subsets each of the development set with N = 24, 48, 60, 72, 84, 93, and 105 samples per class. All other parameters used were the same as listed above.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol