The Mann–Whitney U test was employed to compare clinical data. Variables originating from gene expression analyses were assessed by multivariate regression using generalized linear models (Binomial, Poisson or Gamma families, depending on the response variable) in order to estimate odds ratios (OR) and confidence intervals (CI) while adjusting for the following confounding factors: age (continuous variable, by year), grading (binary variable, grade 3 vs. 1 or 2), quality score (discrete quantitative variable, 1 to 10), stage (binary variable, advanced vs. early), IHC subtype (categorical variable) and region of origin (categorical variable). Higher OR denoted concordance between higher gene expression values and increasing variable values (age, grading, quality and stage) or as compared to a reference category (‘luminal A-like’ in the case of IHC subtype and ‘East’ in the case of Region). Missing age entries were imputed using the median age of patients of the country of origin. Otherwise, patients with missing information in any other variable were excluded from regression analyses. Multivariate regressions were performed using the statsmodels library for Python (www.statsmodels.org). Linear correlations between variables were estimated as Pearson’s correlations. Survival analyses were performed on 400 patients from SSA (recruited between 2005 and 2017, maximum follow-up time of 36 months) using Kaplan–Meier estimators and with differences calculated with log-rank tests (unadjusted) or Cox proportional hazard models (adjusted for confounding factors). Cox proportional hazard models were implemented and plotted alongside the corresponding Kaplan–Meier curves using the Lifelines library for Python (https://lifelines.readthedocs.io/). Except when stated otherwise, all statistical analyses were performed using IBM SPSS Statistics or GraphPad Prism v9.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.