Statistical Methods

Ranjan Batra; Thomas J. Stark; Elizabeth Clark; Jean-Philippe Belzile; Emily C. Wheeler; Brian A. Yee; Hui Huang; Chelsea Gelboin-Burkhart; Stephanie C. Huelga; Stefan Aigner; Brett T. Roberts; Tomas J. Bos; Shashank Sathe; John Paul Donohue; Frank Rigo; Manuel Ares, Jr.; Deborah H. Spector; Gene W. Yeo

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Statistical Methods

RB Ranjan Batra

TS Thomas J. Stark

EC Elizabeth Clark

JB Jean-Philippe Belzile

EW Emily C. Wheeler

BY Brian A. Yee

HH Hui Huang

CG Chelsea Gelboin-Burkhart

SH Stephanie C. Huelga

SA Stefan Aigner

BR Brett T. Roberts

TB Tomas J. Bos

SS Shashank Sathe

JD John Paul Donohue

FR Frank Rigo

MJ Manuel Ares, Jr.

DS Deborah H. Spector

GY Gene W. Yeo

This method is extracted from research article: Nat Struct Mol Biol, Oct 2016

RNA binding protein CPEB1 remodels host and viral RNA landscapes

DOI: 10.1038/nsmb.3310

Request a Protocol

Ask a question

Favorite

The qPCRs were compared in a pairwise analysis and P values were calculated using a Student’s t-test for technical replicates. The error bars are reported as standard deviation or standard error of the technical replicates as mentioned in the respective Fig. legends. Genome-wide APA analysis was done using the MISO algorithm v0.5.2 and Bayes factor and delta psi values were calculated. Bayes factor represents the weight of the evidence in the data in favor of differential expression versus not as described by Katz et al ²⁷. We used a Bayes-factor threshold of 10,000 and difference values (delta Psi) with an absolute value of at least 0.03. A Bayes factor of 10,000 that APA switch is 10,000 times more likely to occur than not. For gene expression, reads were trimmed for adaptor sequences or low-quality bases and then mapped to both the human genome (hg19 build) and the HCMV Merlin genome (Genbank AY446894.2) with GSNAP. Additional filtering of reads that mapped to repetitive elements was also performed. Gene expression values (RPKM ⁶¹) were calculated within each sample, and Z-score analysis was implemented to identify significant differences in expression as previously described ⁶². For AS analysis, we followed the procedure described in Charizanis et al ¹⁹. For each pair of reads that spanned one or more exons (up to three, which is sufficient in practice given the fragment size), all possible isoforms (paths) between the anchored ends were found, and the probability of each isoform to be the actual origin of the paired-end reads was estimated. Each inferred fragment was assigned a probability score. This junction inference step substantially increased the effective number of fragments supporting exon junctions, especially for cassette exons, and increased statistical power in detecting splicing changes. The weighted number of exon or exon-junction fragments uniquely supporting the inclusion or skipping isoform of each cassette exon were counted and a Fisher’s exact test was used to evaluate the statistical significance of splicing changes using both exon and exon-junction fragments, followed by Benjamini-Hochberg multiple hypothesis testing correction to estimate the false discovery rate (FDR). Differential splicing events were identified by requiring FDR <0.05 and |ΔI| ≥0.1. For TAIL-seq data analysis, image files were downloaded from the MiSeq and run on tailseeker2 ²⁵ to determine accurate polyA tail lengths, yielding paired fastq files corresponding to the 5’ (R5) and 3’ (R3 polyA tail) ends of each read. Reads were aligned against the human genome (hg19) and viral genomes (Human_Herpesvirus_5_strain_Merlin) using STAR under default parameters. Features were assigned using Subread with gencode v19 annotations and with Human_Herpesvirus_5_strain_Merlin features, and filtered to obtain only the uniquely mapped protein coding genes. For analysis of virally mapped reads, all genes were counted. Reads with tails measuring 0 lengths were removed. For genes with at least 20 mapped reads, median lengths were measured and the global distributions of these lengths were compared against each other using the Kolmogorov-Smirnov test. For each gene captured in all samples and with at least 20 mapped reads, individual tail length distributions were compared amongst samples using the Mann Whitney U test with a P value cutoff of 0.025.

Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol