iCLIP and RNA sequencing data analysis

MR Madara Ratnadiwakara
SA Stuart K Archer
CD Craig I Dent
IM Igor Ruiz De Los Mozos
TB Traude H Beilharz
AK Anja S Knaupp
CN Christian M Nefzger
JP Jose M Polo
MA Minna-Liisa Anko
request Request a Protocol
ask Ask a question
Favorite

The iCLIP data was analysed using iCount as described (Änkö et al., 2012; König et al., 2010; Müller-McNicoll et al., 2016; Wang et al., 2010). In short, adapters and barcodes were removed from all reads before mapping to the mouse mm10 genome assembly. Uniquely mapping reads were kept to determine statistical significant crosslink sites. For this, iCLIP positions were randomised within co-transcribed regions and statistically significant binding sites (false discovery rate, FDR < 0.05) were calculated. Three biological replicates for SRSF3-BAC and EGFP-NLS control ESCs were pooled for the analysis.

K-mer analysis was performed for motif search as described (König et al., 2010; Wang et al., 2010). One occurrence of k-mer (4-mers for noncoding RNAs and 5-mers for all RNAs) within the evaluated interval (−30,–5)(+5,+30) around the crosslink site was counted and weighted by 1.0. Reference data was generated 100 times by random shuffling of crosslink sites within corresponding genome segments and z-score was calculated relative to the randomized positions. The consensus binding motif was calculated from top 15 using MEME (Bailey et al., 2006). Gene Ontology analysis was performed using DAVID (Huang et al., 2009)

Paired-end RNA-seq reads aligned to the mm10 reference genome using STAR (v2.5.2) with default parameters (Dobin et al., 2013). The iPS-1 and iPS-2 control replicates were merged using samtools-merge (Samtools v1.2); producing a combined alignment for the control condition. Differential splicing analysis was done using MISO v0.5.4 (LINK1; in a relaxed configuration where filter_results = False) and the mm10 alternative event annotation (version 2; LINK2). For each AS event type, the treatment replicates were compared to the combined control (using miso_compare); this identified putative treatment-sensitive AS events. The results were then filtered using MISO’s filter_events command (FILTERS: num-inc 1 = at least one inclusion read, num-exc 1 = at least one exclusion read, num-sum-inc-exc 10 = at least 10 reads of any kind in one of the samples’ delta-psi 0.20 = minimum 0.2 delta PSI, bayes-factor 5 = bayes factor at least 5). Similarly, the paired-end MEF-to-iPSC time-course reads were aligned to the mm10 reference genome using STAR (v2.5.2). The differential splicing analysis was done as above.

Splice junctions from the MEF-to-iPSC time-course were reconstituted from the corresponding transcript annotations, flanked by 100 nt up- and downstream of each junction. The resulting alignments were used as input for JunctionSeq analysis (Anders et al., 2012; Hartley and Mullikin, 2016), which is based on the DEX-seq model but has been reported to detect readily novel splice junctions and intron retention. Briefly, exon coordinates from transcript models with a Transcript Support Level of 1 (the most stringent) were extracted and used as input for QoRTS (Hartley and Mullikin, 2015), together with the alignments from the time-course RNA-seq data. The capability of QoRTS to call novel splice junctions from the data was enabled. The output from QoRTS, a flat exon/junction annotation file, was used together with the alignment data as input for the JunctionSeq software to detect differential exon/junction usage between time-points. Each time-point was contrasted against the original MEF cells (day-0) in one JunctionSeq run. In another run, each time-point was contrasted to the previous time-point. In a third run, all time-points were considered together in the one linear model to test for any effect at all of time-point on exon/junction usage. Log2-fold changes reported are all variance-stabilised transformation of the original log2 fold-changes in expression estimates. In each of the pairwise contrast between timepoints, exonic/junction, features with a FDR < 0.05 were classed as 'regulated', while others were classed either as unregulated (FDR > 0.05), or 'untestable' (too few reads observed in both time points), by JunctionSeq.

Data for SRSF3 iCLIP was filtered for sites that also appeared in the GFP iCLIP control, then converted to bed format. This data was then merged with the exon/junction feature annotations from QoRTS (including de novo junction calls) using BedTools. Detained intron data (Boutz et al., 2015) was similarly merged with the QoRTS annotation. For k-means clustering of time-point data, a minimum autocorrelation filter (correlation between consecutive time-points) was set at 0.1, and maximum and minimum log2 fold changes of 8.5 and 0.5, respectively, were also applied. The number of clusters was set to 20. The interaction analysis used for Figure 3 was performed with STRING (Szklarczyk et al., 2015).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A