Statistical methods

LV Laura J. van ‘t Veer
CY Christina Yau
NY Nancy Y. Yu
CB Christopher C. Benz
BN Bo Nordenskjöld
TF Tommy Fornander
OS Olle Stål
LE Laura J. Esserman
LL Linda Sofie Lindström
ask Ask a question
Favorite

The outcome of interest was death due to breast cancer, and analyses of long-term breast cancer-specific survival (20 year) by the 70-gene risk classification (high and low risk) were performed in patients with ER-positive tumors. Patient follow-up started at the date of primary breast cancer diagnosis and ended at the date of death, contralateral breast cancer diagnosis, emigration from Sweden (only five women emigrated in total), or end of study follow-up (December 31, 2012).

For comparison with previous studies, we also performed 10-year analysis of distant metastasis-free survival. However, information on metastasis is less complete as compared to information on death. In our study, approximately 2%, i.e., 14 patients out of 727 patients, died from breast cancer but have missing information on metastasis. In patients with ER-positive disease and available gene expression information (538 patients) as included in this study, 11 patients who died from breast cancer had missing information on metastasis. For these 11 patients, date of death was used instead of the date of metastasis.

Kaplan–Meier analyses were performed by STO-3 trial arm and 70-gene risk classification. The significance was assessed using the log-rank test.

Multivariable analysis by the 70-gene risk classification was performed using Cox proportional hazard modeling adjusting for classical patient and tumor characteristics (age and calendar period of diagnosis, progesterone receptor status, HER2 status, Ki-67 status, tumor grade, and tumor size). Multivariable analysis for the ultralow 70-gene risk group by trial arm was not performed due to low sample size.

Flexible parametric survival models were used to estimate hazard ratios over time since diagnosis. Breast cancer-specific death rates were modeled through flexible parametric survival models using a restricted cubic spline function for the baseline mortality rate [20, 21]. Time-dependent multivariable analysis was performed for 1-, 5-, 10-, 15-, and 20-year follow-up time points, adjusting for the same patient and tumor characteristics as listed above. A spline with three degrees of freedom was used to estimate the hazard ratios. For the time-dependent covariate (tamoxifen trial arm), we used a second spline function with one degree of freedom to model the interactions between the covariate and time. The stpm2 function in Stata version 14.2 was used for the modeling and the analyses [20].

The proportional hazard assumption for the main exposure variable (70-gene risk classification) was assessed by including a time-dependent covariate in the model. No significant deviation was noted. Data preparation and analysis were done using SAS version 9.4, Stata version 14.2, and R version 3.4.0.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A