All statistical analyses were developed using R packages. MicroRNAs that showed signal in both tumor and normal tissue in at least 50% of cases were included in analyses presented here (Supplementary Table S1). Affymetrix gene expression array data obtained from different platforms were combined using the “matchprobes” package in R. For all Affymetrix array data (CEL files on all samples), after scan values were normalized using RMA as implemented in Bioconductor in R. For genes with more than one probe set, the mean gene expression was calculated. The GEO accession number of these array data is GSE44021 for mRNA at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE44021 and GSE67268 for miRNA at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE67268
Paired t-tests were used to identify differences in matched tumor/normal samples for mRNA expression. To find miRNAs with significant fold changes, we applied the Wilcoxon method to the fold change data in log10 scale with Bonferroni correction at 0.05, which resulted in a threshold P-value of 1.92E-04 (0.05/260 miRNAs tested). Spearman correlations were used to evaluate the association between expressions of miRNA and mRNA. Nearly six million (267 miRNAs × 22,277 mRNA probes = 5,947,959) Spearman correlations and their corresponding P-values were computed. To address the multiple testing problem here we used a Bonferroni corrected P-value cut off of 8.40E-09 (0.05/5,947,959 correlations tested) to select significant miRNA–target gene pairs. We also explored associations between miRNA and mRNA expression and clinical/pathological variables using Spearman analysis. For all evaluations presented here (including relating expression to survival), we used the miRNA signals (average delta Ct) or mRNA signals (average) for tumor:normal expressed as fold change ratios. For each miRNA or mRNA, we applied the Kaplan-Meier method to visualize differences and the Log-Rank test to statistically compare survival by expression levels divided as high versus low expression.
To further explore patterns of expression of miRNAs visually, we performed hierarchical clustering of data from miRNA expression by case. For this clustering, missing values were replaced by the median for each probe, and data were transformed to normalize their distribution. The R function ‘heatmap’ was used to generate the heatmap with the method set to ‘ward’ to calculate the distance used for the hierarchical clustering. We also evaluated the 11 demographic/clinicopathologic variables shown in Supplementary Table S2 in relation to different clusters of patients identified as shown in Supplementary Figure S1.
We used Cox proportional hazards regression models to evaluate survival as the hazard ratio (HR) for miRNA and gene expression fold change with adjustment of the four clinical variables age, gender, metastasis, and stage. We coded the fold change variables for miRNA and gene expression in two ways. First we assigned a single ordinal variable to represent each of the four quantile intervals (as 0, 1, 2, 3 to represent values in the ranges of 0 to 25%, 25 to 50%, 50 to 75%, and 75 to 100% of the distribution, respectively). Second, we created indicator variables for each of the four quartiles so that we could compare Q2, Q3, and Q4 separately to Q1 as the reference category.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.