Models and estimators of net survival

Juste Aristide Goungounga; Célia Touraine; Nathalie Grafféo; Roch Giorgi

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Models and estimators of net survival

JG Juste Aristide Goungounga

CT Célia Touraine

NG Nathalie Grafféo

RG Roch Giorgi

This method is extracted from research article: BMC Med Res Methodol, May 2019

Correcting for misclassification and selection effects in estimating net survival in clinical trials

DOI: 10.1186/s12874-019-0747-3

Request a Protocol

Ask a question

Favorite

To estimate net survival, two setting are defined according to cause of death information. When one considers that this information is available for each patient, net survival is estimated in “cause-specific setting”. But if cause of death information is unavailable, or if one wants to get rid of cause of death, net survival is estimated in “population-based setting”. In this context, cause of death information is indirectly obtained by matching the observed data with other-cause hazard drawn from general population life tables. Indeed, the tables contain daily hazard rate $λ_{P_{i}}$ for each matched individual i from the general population of interest. The main assumption to consider that, is the fact that the cancer part in the whole mortality is negligible. In consequence, the other-cause hazard of the studied sample is equal to that of general population of interest. In these two settings the common assumption is that excess hazard and other-cause hazard are independent.

Furthermore, in the two settings, net survival may be estimated using non-parametric estimators or parametric models; the latter allow estimating and testing effects of covariates on the excess mortality.

Some well-known estimators presented in the next sub-section (Kaplan-Meier, Nelson-Aalen, Cox) were firstly developed in the overall cause of death setting. For simplicity reason, we present their adaptation directly in the cause-specific setting, where the main change concerns the event indicator.

Here, we present briefly the properties of 1) one cause-specific method, 2) one mixed population-based and cause-specific method, and 3) one population-based method. The first is the popular Kaplan-Meier (KM) estimator in the cause-specific setting with right-censoring of the times to death from non-cancer causes. The second uses an adaptation of the Nelson-Aalen estimator to account for the informative censoring problem and uses other-cause mortality information from population life tables. The third is the Pohar-Perme (PP) estimator, a reliable estimator of net survival in population-based studies.

Estimation of net survival in the cause-specific setting leads to consider deaths from cancer as events and to right censor deaths from other causes and live patients. The KM estimator of net survival is then:

In this equation, n is the number of patients, $N_{E} (t) = \sum_{i = 1}^{n} N_{E, i} (t)$ is the number of cancer-related deaths up to the time t obtained by summing up the individual counting processes N_{E, i}(t), and $Y (t) = \sum_{i = 1}^{n} l [t_{i} \geq t]$ is the at-risk process just before time t (i.e., alive or not censored patients; the at risk process counts the subjects who did not experience the event by time t and, thus, who are still “at risk” of experiencing the event).

In the cause-specific setting, Pohar-Perme et al. proposed an adaptation of the Nelson-Aalen estimator [12]. When the assumption of independence between the censoring process and the cancer death process is violated, mainly due to age, the censoring becomes informative. Using the inverse probability of censoring weighting approach on the Nelson-Aalen estimator [12], Pohar-Perme et al. derived an asymptotically unbiased estimator of the net survival:

In this equation, $N_{E}^{w} (t) = \sum_{i = 1}^{n} N_{E, i}^{w} (t)$ and $Y^{w} (t) = \sum_{i = 1}^{n} Y_{i}^{w} (t)$ are, respectively, the weighted aggregated counting process and the at-risk process. More precisely, ${dN}_{E, i}^{w} (t) = \frac{{dN}_{E, i} (t)}{S_{P, i} (t_{i} -)}$ and $Y_{i}^{w} (t) = \frac{Y_{i} (t)}{S_{P, i} (t -)}$ , where each of components are weighted, respectively, by S_{P, i}(t_i−) and S_{P, i}(t−) the inverse of the individual expected survival derived from population life tables obtained respectively at times t_i− and t−. Thereafter, we called this estimator, with these types of weights, the weighted Nelson-Aalen (wNA) estimator.

The PP estimator [12] is a reliable non-parametric estimator of net survival developed to overcome some assumptions of excess hazard modeling. It corresponds to the difference between the Nelson-Aalen estimate and the cumulative population of the patients still at risk at each death, where the at-risk process and the counting process are weighted to give greater weight to subjects with high risk of other-cause mortality. The PP estimator of net survival is:

where $N^{w} (t) = \sum N_{i}^{w} (t)$ is the sum of the individual all-cause counting process $N_{i}^{w} (t)$ and with $d N_{i}^{w} (t) = \frac{d N_{i} (t)}{S_{pi} (t)}$ , which, as $Y_{i}^{w} (t)$ , is weighted by the inverse of the individual expected survival. This latter quantity and the general population other-cause mortality λ_P are derived from population life tables.

Among these non-parametric estimators, only the KM estimator is a purely cause-specific estimator because, in our case of cause-specific setting, it uses only cancer specific death information. Though used in the cause-specific setting, the wNA estimator uses also the population other-cause mortality to correct the estimation of net survival. The PP estimator is used only in population-based settings; it needs the other-cause mortality and the vital status to provide an estimate of net survival.

In the cause-specific setting, the semiparametric Cox proportional hazards model expresses the excess hazard at time t as: λ_E(t, X) = λ_{E, 0}(t) exp(β^TX) where λ_{E, 0}(t), the baseline excess hazard at time t, and β corresponds to the proportional linear effect of covariate X on the baseline excess hazard estimated separately through the semiparametric approach.

One solution to derive the baseline cumulative excess hazard function from the Cox model was given by Breslow [20]. Using Breslow estimator with cancer death as status indicator δ_i, the baseline cumulative excess hazard function Λ_{E, 0}(t) may be estimated with the expression applied to times t_i at which the events take place:

In this equation, $R (t)$ denotes the risk at time t of all individuals still at risk of death from cancer at time t, $\hat{β}$ corresponds to the effect of covariates and Δ_i corresponds the indicator of death due to the cancer of interest. In this formula, we estimate unadjusted cumulative excess hazard by setting $\hat{β} = 0$ [21].

The corresponding net survival from Breslow’s estimator is therefore:

where ${\hat{Λ}}_{E, 0}$ corresponds to the baseline cumulative excess hazard function estimated separately from $\hat{β}$ .

The new model is an extension of the flexible parametric excess hazard model proposed by Giorgi et al. [15]. It is based on the seminal excess hazard model of Estève et al. [14] where the observed hazard of a patient i at time t_i is:

and where the baseline excess hazard (λ_{E, 0}) is modelled by a piecewise constant function; β represents the effects of the vector of covariates X including demographic variables Z (such as age at diagnosis, year of diagnosis, sex, of the individual i).

In the model of Giorgi et al. [15], the baseline excess hazard and the time-dependent covariates are both modelled using specific B-spline functions. More precisely, for a higher degree of flexibility, Giorgi et al. used quadratic B-splines (order 3) and two interior knots.

For simplicity, only the case of proportional hazard effects of prognostic covariates is considered. The simplified version of Giorgi’s model is then:

where v_j are the spline coefficients, B_{j, 3}(t_i) the value at time t_i of the j ^th B-spline of order 3 and degree 2, X the vector of covariates with proportional hazard effects β, and λ_Pi the other-cause hazard of individual i, at age a_i + t_i in year y_i + t_i.

In agreement with Cheuvart and Ryan [19], we considered that the other-cause mortality of a participant in a clinical trial may be corrected by multiplying the population hazard obtained from the life table by a scale parameter α. This parameter is the average effect of selection on the other-cause mortality in the trial participants. This effect in the general population with same demographic characteristics equals 1 (α = 1). The new flexible model we call the “rescaled B-spline” model (RBS) can be written as follows:

To estimate the parameters of the RBS model, we used the maximum likelihood procedure. The log-likelihood of the RBS model can be written:

where, given an individual i, observation δ_i = 1 is the indicator of death from any cause, α the scale parameter of the instantaneous other-cause mortality λ_P, and Λ_P the cumulated value of λ_P over all the follow-up duration. The rescaled cumulative population hazard αΛ_P may not be a constant as in the classical additive excess hazard model. In addition, the scale parameter α will be considered in the estimation process. For mathematical convenience and because all patients may die from another cause than the cancer under study, it is assumed that α > 0 and constant over time. The maximization of the log-likelihood was performed using optim function in R based on Byrd method for non-linear optimization problems with box constraints [22]. The estimates of net survival were derived from the cumulative excess hazard calculated by derivation of the corresponding estimate of the excess hazard function. The confidence interval of the net survivals was obtained with a Monte-Carlo method [17]. The R code that implements these estimation procedures is available on request from the authors.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol