4.4. Survival Analysis of Core ADME Genes in Non-TCGA Cancer Datasets

DH Dong Gui Hu
PM Peter I. Mackenzie
PN Pramod C. Nair
RM Ross A. McKinnon
RM Robyn Meech
request Request a Protocol
ask Ask a question
Favorite

It was not possible to find independent datasets with transcriptomic profiling and overall survival data for all of the 20 TCGA cancer types analysed in the present study. However, we were able to analyse the lung cancer dataset and the breast cancer dataset from the Kaplan-Meier Plotter (KM-lung cancer) [41] (https://kmplot.com) to validate our findings from the three TCGA cancer types (BRCA, LUAD, LUSC). The KM-lung cancer dataset was established using gene expression data (Affymetrix HGU133A, HG133A+2, and HGU133+2) and clinicopathological parameters of 2437 patients that were collected from 14 published independent datasets (accessed 1 October 2020) [41]. Of these patients, there were 1925 patients with overall survival data for analysis, including 865 patients with adenocarcinoma (KM-LUAD) and 675 patients with squamous cell carcinoma (KM-LUSC). The breast cancer dataset (KM-BRCA) was also established using gene expression data (Affymetrix HGU133A, HGU133+2) and clinical data of 5139 patients that were collected from 35 independent Gene expression Omnibus (GEO) datasets (accessed 1 October 2020, https://kmplot.com) [42]. Of this dataset, there were 1402 patients with overall survival data for analysis.

We plotted the Kaplan-Meier plots and performed the logrank tests for the same set of core ADME genes (Table S3) that were analysed for the three TCGA cancers (BRCA, LUAD, LUSC) through the Kaplan-Meier plotter (https://kmplot.com). As listed in Table S3, most core ADME genes analysed had more than one probe set on the Affymetrix HGU oligo arrays. We performed survival analysis for all probe sets for every core ADME gene analysed. Because of this, we performed a total of 67, 56, and 54 independent logrank tests for the 19, 18, and 23 core ADME genes that were analysed for KM-BRCA, KM-LUAD, and KM-LUSC, respectively (Table S3). Raw independent logrank p values of all probe sets conducted for each of the three datasets (KM-LUAD, KM-LUSC, KM-BRCA) were adjusted separately using Bonferroni correction. A Bonferroni-corrected cutoff logrank p-value of < 0.05 was considered to be statistically significant. A significant association was defined where all probe sets of a gene had a Bonferroni-corrected p-value of < 0.05. Conflicting results were seen for some genes such as DPYD in KM-LUAD, where all three DPYD probe sets showed a Bonferroni-corrected p-value of < 0.05; however, one probe set (1554534_at) was associated with unfavourable OS but the two other probe sets (1554536_at, 204646_at) showed association with favourable OS (Table S3). Genes with conflicting results among their probe sets were considered to be not statistically significant.

The Kaplan-Meier plotter generated Hazard ratio (HR) and 95% confidence interval (CI) for each analysis. Both HR and 95% CI values for the survival analyses using the SLC15A2 expression levels from its two probe sets in KM-LUAD were given in Figure 5.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A