Random forest training

MV Martin Vallières
EK Emily Kay-Rivest
LP Léo Jean Perrin
XL Xavier Liem
CF Christophe Furstoss
HA Hugo J. W. L. Aerts
NK Nader Khaouam
PN Phuc Felix Nguyen-Tan
CW Chang-Shu Wang
KS Khalil Sultanem
JS Jan Seuntjens
IN Issam El Naqa
request Request a Protocol
ask Ask a question
Favorite

The process of random forest training inherently uses bootstrapping in order to train the multiple decision trees of the forest. Conventionally, one different decision tree is trained for each bootstrap sample. In this work, we used 100 bootstrap samples to train each random forest constructed from the training set (H&N1 and H&N2 cohorts; n = 194). For each bootstrap sample, the imbalance-adjustment strategy detailed above was used such that each bootstrap sample produced multiple decision trees (one per partition) to be appended to a random forest. Therefore, the final number of decision trees per random forest was dependent on the actual proportion of events in each bootstrap sample for each outcome studied. The three final random forest models developed in this work (italic fonts in Table 1, Supplementary Table S4) were constructed using 582, 661 and 518 decision trees for LR, DM and OS, respectively.

In addition to the imbalance-adjustment strategy adopted in this work, under/oversampling of the instances in each partition of an ensemble was used to further correct for data imbalance in the random forest training process. Under/oversampling weights of the minority class of 0.5 to 2 with increments of 0.1 were tested in this work. Stratified random sub-sampling was used to estimate the optimal weight for a given training process (and also to estimate the optimal clinical staging variables to be used) in terms of the maximal average AUC, a process randomly separating the training set of this work into multiple sub-training and sub-testing sets (n = 10) with 2:1 size ratio and equal proportion of events. The final random forest models developed in this work (italic fonts in Table 1, Supplementary Table S4) used oversampling weights of 1.4, 1.6 and 1.7 (in conjunction with the previously described imbalance-adjustment strategy) to train the decision trees of the forests for LR, DM and OS, respectively. The overall random forest training process is pictured in Supplementary Fig. S7.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A