The process of random forest training inherently uses bootstrapping in order to train the multiple decision trees of the forest. Conventionally, one different decision tree is trained for each bootstrap sample. In this work, we used 100 bootstrap samples to train each random forest constructed from the training set (H&N1 and H&N2 cohorts; n = 194). For each bootstrap sample, the imbalance-adjustment strategy detailed above was used such that each bootstrap sample produced multiple decision trees (one per partition) to be appended to a random forest. Therefore, the final number of decision trees per random forest was dependent on the actual proportion of events in each bootstrap sample for each outcome studied. The three final random forest models developed in this work (italic fonts in Table 1, Supplementary Table S4) were constructed using 582, 661 and 518 decision trees for LR, DM and OS, respectively.
In addition to the imbalance-adjustment strategy adopted in this work, under/oversampling of the instances in each partition of an ensemble was used to further correct for data imbalance in the random forest training process. Under/oversampling weights of the minority class of 0.5 to 2 with increments of 0.1 were tested in this work. Stratified random sub-sampling was used to estimate the optimal weight for a given training process (and also to estimate the optimal clinical staging variables to be used) in terms of the maximal average AUC, a process randomly separating the training set of this work into multiple sub-training and sub-testing sets (n = 10) with 2:1 size ratio and equal proportion of events. The final random forest models developed in this work (italic fonts in Table 1, Supplementary Table S4) used oversampling weights of 1.4, 1.6 and 1.7 (in conjunction with the previously described imbalance-adjustment strategy) to train the decision trees of the forests for LR, DM and OS, respectively. The overall random forest training process is pictured in Supplementary Fig. S7.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.