Testing the ability of the classifier development method to work well for a dataset with very many, but few useful, features: predicting ten year survival for patients with prostate cancer

JR Joanna Roder
CO Carlos Oliveira
LN Lelia Net
MT Maxim Tsypin
BL Benjamin Linstid
HR Heinrich Roder
request Request a Protocol
ask Ask a question
Favorite

This investigation used the same datasets as above with the same goal to predict 10-year survival. Here we compared the DRC classifier approach with the RF. To mimic the situation of very many features, with only a few with utility for the problem in question, we added 10,000 randomly generated gaussian features (mean = 0 and standard deviation = 1) to both the development and validation data sets. For the DRC approach, rank-based kNNs were used as atomic classifiers, to avoid any problems with differences in scale between the original and randomly generated features. All kNN classifiers (k = 7) using the 10,343 features singly and pairs of features that passed single feature filtering were considered. Filtering was set as in the previous problem and resulted in around 25% of atomic classifiers considered passing filtering and 100,000 dropout iterations were used.

DRC and RF were generated using identical training/test set realizations for 9 subsets each of the development set with N = 24, 48, 60, 72, 84, 93, and 105 samples per class. All other parameters used were the same as listed above.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A