Normally distributed determinants are presented as mean (± standard deviation [SD]). Skewed variables are presented as medians with interquartile ranges (IQR). Frequencies and percentages are used for categorical data. The Kaplan-Meier method was used to estimate biochemical disease-free survival (bDFS). For comparisons between groups, the log-rank test was used.
Missing data was considered to be missing at random. Multiple imputation by chained equations was used to impute missing data, creating 20 imputation datasets. All predictors listed above, additional patient and treatment characteristics listed in Supplementary File B, the outcome, and the cumulative baseline hazard, calculated with the Nelson-Aalen function, were included in the imputation procedure [26], [27]. All subsequent modelling steps were pooled over the 20 imputation datasets.
Before fitting the multivariable model, non-linear relationships between continuous predictors and the outcome were assessed visually by plotting the predictors against log-hazard using restricted cubic splines with three knots (10th, 50th, and 90th percentile). In case of visible non-linearity, spline transformations were tested against linear modelling through univariable and multivariable Cox proportional hazards models (likelihood-ratio test). If model fit improved significantly, a spline-transformation was used. For pre-salvage PSA, a natural logarithm-transformation was used based on literature and model fit in our dataset [28].
In case correlations between candidate variables were ≥ 0.75, the clinically most relevant variable was chosen for multivariable testing. MRI-based T-stage showed high correlation with seminal vesicle involvement (correlation coefficient 0.78). Based on clinical judgement, MRI-based T-stage was therefore excluded from multivariable regression analysis. A multivariable Cox proportional hazards regression model was fitted, providing hazard ratios (HR) with 95% confidence intervals (CI). Stepwise backward elimination was performed, using lowest Akaike’s Information Criteria (AIC) for selection [29]. No interactions were assessed due to the limited sample size.
For both models the assumptions of the Cox proportional hazards model were checked. The proportionality assumption was assessed using Log-Log curves and Schoenfeld residuals for categorical and continuous variables, respectively. Linearity of continuous variables was checked with Martingale residuals. Influential outliers were assessed by calculating dfbeta residuals.
The discriminative ability of the model was assessed using Harrell’s C-statistic. Internal validation was performed through bootstrapping with 2000 resamples for each imputation set, in which all modelling steps were repeated. The optimism of each model and shrinkage factors were calculated, and the β-coefficients and C-statistic were adjusted accordingly. The predictive accuracy of the optimism-corrected models was visualized with calibration plots at 12, 24, and 36 months.
For both models a nomogram and webtool were constructed using the optimism-corrected coefficients. Finally, for each model separately, three risk groups were identified on the basis of the 25th and 75th percentile of the linear predictor. The Kaplan-Meier method was used to display the biochemical disease-free survival curves for each risk group.
All statistical analyses were performed using R studio (version 3.6.1, R Foundation for Statistical Computing, Vienna, Austria, https://rstudio.com) and the survival, survminer, rms, pmsampsize, ggplot2, mice, psfmi, DynNom, and regplot packages [30]. Reporting was according to the TRIPOD statement [29].
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.