4.10. Colon Cancer Cohort Risk Prediction and Survival Analysis in Different CMSubtypes

SB Santiago Bueno-Fortes
JM Julienne K. Muenzner
AB Alberto Berral-Gonzalez
CH Chuanpit Hampel
PL Pablo Lindner
AB Alexandra Berninger
KH Kerstin Huebner
PK Philipp Kunze
TB Tobias Bäuerle
KE Katharina Erlenbach-Wuensch
JS José Manuel Sánchez-Santos
AH Arndt Hartmann
JR Javier De Las Rivas
RS Regine Schneider-Stock
request Request a Protocol
ask Ask a question
Favorite

To evaluate the survival and predict the risk for the different groups of CRC tumors classified according to the subtypes (CMSs), a robust version of the multivariate Cox regression model was applied. We used regularized multivariate Cox proportional-hazards regression with L1 norm penalty [42], with the scope to build a multigenic risk predictor. A recursive algorithm, using double-nested cross-validation with optimization of regression parameters, searched for the value-of-risk score that best split the cohort into two groups: low risk and high risk. The results of this analysis on CRC tumor samples of subtypes: CMS1 (167 samples) and CMS4 (246 samples) are presented in Figure 5A,B. Once each patient’s risk had been calculated, a Kaplan–Meier analysis checked the separation of the two groups according to the survival data: (i) a high-risk group of individuals (with poor survival, plotted in red) and (ii) a low-risk group of individuals (with good survival, plotted in blue) (presented in Figure 5C,D). A log-rank test evaluated the difference between the Kaplan–Meier curves of the two groups of patients for each gene signature tested. This statistical test is non-parametric and makes no explicit assumptions about the form of the survival curves. A penalty procedure shrunk to zero the coefficient (the Beta Values) of any feature of the multivariate model (i.e., any gene) not used to predict the risk. Thus, the multivariate model selected the features (i.e., the genes) that had more power when the prediction was computed, providing a gene risk incidence score for each gene tested in the risk prediction (which can be interpreted as a coefficient of the influence of each gene on the predicted risk) (Figure 5E,F). The genes with highest risk incidence scores (i.e., the genes that showed the largest beta values inside the multivariate model) should split the patients into groups with different prognoses with a statistically significant p-value. Individual Kaplan–Meier curves of each gene can be calculated, to test how the expression of each individual gene can divide or split a given cohort of tumor samples into two groups of low-risk and high-risk patients, which correspond to the gene expression level in those samples (as shown in Figure 6). All these procedures and methods were developed and applied using the R (https://www.r-project.org/, last accessed 10 October 2021) programming language for statistical computing.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A