QUANTIFICATION AND STATISTICAL ANALYSIS

SM Sung-Hwan Moon
CH Chun-Hao Huang
SH Shauna L. Houlihan
KR Kausik Regunath
WF William A. Freed-Pastor
JI John P. Morris, IV
DT Darjus F. Tschaharganeh
EK Edward R. Kastenhuber
AB Anthony M. Barsotti
RC Rachel Culp-Hill
WX Wen Xue
YH Yu-Jui Ho
TB Timour Baslan
XL Xiang Li
AM Allison Mayle
ES Elisa de Stanchina
LZ Lars Zender
DT David R. Tong
AD Angelo D’Alessandro
SL Scott W. Lowe
CP Carol Prives
request Request a Protocol
ask Ask a question
Favorite

Statistical significance between groups was calculated by two-tailed Student’s t test unless otherwise specified. Statistical evaluation of tumor free survival in mouse experiments was based on the log-rank (Mantel-Cox) test for comparison of the Kaplan-Meier event-time format. Prism 6 and Microsoft Excel software were used to calculate the values. Power analysis was done using a two-group independent sample t test. Significance values are p < 0.05 (*), p < 0.01 (**), and p < 0.001 (***) unless otherwise specified.

Expression data measured by microarray following p53 reactivation in liver tumors was described in a previously published experiment (Tschaharganeh et al., 2014; Xue et al., 2007) and is deposited as NCBI GEO dataset GSE52091. Data was RMA normalized, log-transformed, and condensed by gene Entrez ID using the affy package (Gautier et al., 2004) implemented in R (http://cran.r-project.org/ ). Differential gene expression between Day 0 and Day 8 samples was determined by empirical Bayesian analysis within the LIMMA package (Smyth, 2004). Complete dataset is available at NCBI Gene Expression Omnibus (GSE52091). The sex of the cell line used to produce tumors in this model is not known as the data was generated from embryos and is from a previously published dataset (Xue et al., 2007).

Expression data measured by RNA-seq is deposited as NCBI GEO dataset GSE121558. Raw sequencing data was analyzed by first removing adaptor sequences using Trimmomatic (Bolger et al., 2014). These RNA-seq reads were then aligned to mouse GRCm38/mm10 genome with STAR (Dobin et al., 2013), and transcript count was quantified using featureCounts (Liao et al., 2014) to generate raw count matrix. Differential gene expression between p53WT and p53null hepatocytes was determined by running the DESeq2 package (Love et al., 2014) implemented in R (http://cran.r-project.org/).

Cancer cell lines from the Cancer Cell Line Encyclopedia (CCLE) (https://www.broadinstitute.org/ccle) were stratified based on p53 status: p53null lines where defined as those with TP53 copy number less than log2(−1) or harboring a nonsense or frameshift mutation, whereas p53WT lines where defined as those with no homozygous copy number loss and no mutations. Microarray data from these cohorts were compared using the R package limma. Furthermore, to define a mevalonate gene expression signature value, each mevalonate pathway gene expression value was further converted to a z-score, and the mean of the 17 gene z-scores for each cell line was calculated.

The following datasets were downloaded from the Cancer Genome Atlas (TCGA) data portal – transcriptome expression (RNA-Seq V2), somatic mutations, clinical data, and copy number variations (segmented copy number calls) of the following tumor types: Liver Hepatocellular Carcinoma (LIHC), Colon Adenocarcinoma (COAD) and Breast Invasive Carcinoma (BRCA). All samples that included validated data for RNA-sequencing for mRNA expression and DNA sequencing for somatic mutation analysis were included in the analysis. When available, the matched normal samples pertaining to each tumor type was downloaded and used for the analysis. While considering somatic mutations, samples that had no reported mutations in a particular gene were assumed to be WT for that gene.

For purpose of the correlation analysis described in Table S5, the liver tumor samples were stratified on the basis of their ABCA1 mutation status as WT and mutant. The samples with mutant ABCA1 were removed from the dataset. The tumor samples with no identified mutations in the ABCA1 gene are assumed to be WT. The transcriptome expression profiles were obtained from the RNA-seq V2 dataset. The Pearson correlation coefficient of the gene expression of the mevalonate pathway genes and the RNA expression of ABCA1 gene were calculated using MATLAB. The Pearson correlation and the associated p values were tabulated (Table S5).

For analysis of gene expression and plotting of box-plots showing differential gene expression across tumor and normal samples, the expression level of mRNA of particular genes was assessed and compared using the appropriate TCGA RNA-seq V2 datasets. For TCGA - Liver Hepatocellular Carcinoma (TCGA-LIHC) dataset, RNA-seq data pertaining to normal (n = 50) samples was compared versus tumor (n = 371) samples. Similarly, for TCGA-Colon Adenocarcinoma (TCGA-COAD), RNA-seq data of normal (n = 41) samples was compared versus tumor (n = 272) samples and for TCGA-Breast Invasive Carcinoma (TCGA-BRCA), RNA-seq data of normal (n = 110) samples was compared versus tumors (n = 1037) samples.

In the case of comparing mRNA expression of specific genes in tumor samples with WT p53 versus samples with missense mutations in p53, the tumor samples were stratified on the basis of the p53 status. For the TCGA human liver cancer samples, tumors with WT p53 (n = 133) were compared to tumor samples with missense mutations (n = 40) in p53.

In the boxplots, the central line represents the sample median, and the notches display the variability of the median between samples (95% confidence interval). The width of a notch is calculated such that the box-plots whose notches do not overlap have different medians at the 5% significance level. Outliers are not displayed for clarity of the figure.

In these cases, statistical significance was determined by p-values as computed by Welch’s two-tailed t test (assuming unequal variances). When appropriate, the statistical tests were corrected for multiple testing as needed by False Discovery Rate (FDR correction) as determined by the Benjamini-Hochberg procedure. Most analysis was performed and box-plots for the figures were plotted using MATLAB.

GO analysis was performed using DAVID Bioinformatics Resources. Genes that are significantly associated with p53 restoration (by Gene Ontology) are included in Table S2. Gene set enrichment analysis (Subramanian et al., 2005) was performed using GSEA v2.07 software. Gene sets used in this study are included in Table S3. A detailed description of GSEA methodology and interpretation is provided at https://www.broadinstitute.org/gsea/doc/GSEAUserGuideFrame.html. Significance of gene sets from the GSEA was based on the normalized enrichment score (NES) and the false discovery rate q-value (FDR q-val) to determine the probability that a gene set with a given NES represents a false-positive finding.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A