Gene-by-gene one-way ANOVAs to identify significantly enriched (i.e., RBP-associated) transcripts

JR J Christopher Rounds
EC Edwin B Corgiat
CY Changtian Ye
JB Joseph A Behnke
SK Seth M Kelly
AC Anita H Corbett
KM Kenneth H Moberg
request Request a Protocol
ask Ask a question
Favorite

Gene-by-gene ANOVAs and post hoc tests for the 5760 genes identified in the “testable” set, along with bar graphs of IP/Input values, were generated in Prism 8 for Windows 64-bit (GraphPad Software). Custom R and PRISM scripts were written to generate and label the 5760 PRISM data tables, one per testable gene, required for this analysis, and custom R scripts were written to extract and combine the outputs from each test; these scripts are all available in Supplementary File S3. See Results for a summary and below for a further detailed discussion of the statistical testing used to define the testable transcript set and identify significantly enriched (i.e., RBP-associated) transcripts in our RIP-Seq results.

To identify RNA targets of Nab2 and Atx2—that is, RNAs enriched in either Nab2 RIP or Atx2 RIP samples relative to control RIP—directly comparing normalized read counts between RIP samples is insufficient. Differences in RNA expression between samples must be accounted for, as these differences can partially or wholly explain differences in the amount of RNA isolated by IP. We employed a common solution to this problem used in RIP- and ChIP-qPCR (Zhao et al. 2010; Aguilo et al. 2015; Li et al. 2019), scaling normalized RIP reads for each gene in each sample by the corresponding number of normalized Input reads. For clarity, we describe these values as “IP/Input”—they are commonly referred to as “Percent Input” or “% Input.” These IP/Input values could then be compared between samples, further normalizing them to elav-Gal4 alone controls. In this way, RIP fold enrichment, appropriately normalized to library size/composition and gene expression, were calculated for each gene in each sample. To promote the reliability of our analyses and increase our statistical power to detect differences in fold enrichment, we limited further analyses to a testable set of 5760 genes out of the 17,753 total genes annotated in the BDGP6.22 genome. The testable gene set was defined as having detectable expression in all twelve Input samples and an average normalized read count in either Nab2 or Atx2 RIP samples >10. These criteria were based on those used in Lu et al. (2014) and Malmevik et al. (2015). In this defined gene set, differences in fold enrichment were statistically tested using gene-by-gene one-way ANOVAs (Li et al. 2019) in Prism 8 (GraphPad software), applying Dunnett’s post hoc test to calculate significance P-values only for the comparison of each experimental sample to the control sample (Dunnett 1955). In each case, P-values were adjusted to correct for multiple hypothesis testing only within each gene-by-gene ANOVA. This approach identified a small, focused set of statistically significantly enriched RNAs, suggesting that additional corrections across all genes to control type I error (i.e., false positives) were not necessary (Rothman 1990). Due to comparatively low read depth, likely due to incomplete rRNA depletion during library preparation, we suspect that, rather than failing to adequately control type I error, the RBP-associated transcripts we identified through this approach represent a partial census of Nab2 and Atx2 bound RNAs in vivo.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A