2.4.1. Identification with Random Survival Forest

The random survival forest is an ensemble tree-based method used to analyze right-censored survival data [27]. The nonparametric random survival forest model can assess the nonlinear effects of variables and explore the complex interactions between variables. In addition, variables in the random survival forest model that do not have prognostic ability can be filtered by variable importance. The variable selection procedure through the random survival forest in this study consists of the following three steps:

Construct a random survival forest model with candidate variables. The numbers of trees that offer the lowest error rate were chosen

In the constructed random survival forest model, variables with importance greater than 0 are selected and recorded

Considering the existence of random processes, steps A and B would be repeated 100 times to generate a matrix of variables with a variable importance value greater than 0. The prognostic factors that were recorded as important prognostic factors multiple times were regarded as important prognostic factors

In this study, identification of molecular prognostic factors through the random survival forest was implemented with the following procedures. First, we performed a rough screening on all molecular factors. The clinical prognostic factors and all molecular factors were used as variables in the random survival forest. Variables that showed positive prognostic power more than 90 times according to the variable selection procedure were identified as the potential important prognostic factors. Then, we tried to identify robust molecular prognostic factors that could supplement the clinical prognostic factors. The potential important prognostic factors identified by the rough screening were screened again. Here, the identified potential important molecular factors and the clinical prognostic factors were used as variables of the random survival forest. The variable selection procedure was repeated 10 times to ensure the robustness. In each repetition, variables that showed positive prognostic power over 95 times were recorded as important prognostic factors. Finally, molecular factors that were recorded as important prognostic factors in all 10 repetitions were regarded as the final important prognostic factors identified by the random survival forest.

Note: The content above has been extracted from a research article, so it may not display correctly.



Q&A
Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.



We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.