After filtering for partial isoforms including 5′ degradation products using TAMA’s script (tama_remove_fragment_models.py) with default parameters (Kuo et al., 2017), isoforms detected using SMRT sequencing were characterized and classified using SQANTI2 (v7.4) (Tardaguila et al., 2018) in combination with GENCODE (human v31, mouse vM22) comprehensive gene annotation, FANTOM5 CAGE peaks (Lizio et al., 2019) (human – hg38, mouse – mm10), polyA motifs, Intropolis junction dataset (Nellore et al., 2016) or STAR output junction file, FL read counts (abundance file), and Kallisto counts from mouse and human fetal RNA-Seq data. An isoform was classified as FSM if it aligned with reference genome with the same splice junctions and contained the same number of exons, ISM if it contained fewer 5′ exons than reference genome, NIC if it is a novel isoform containing a combination of known donor or acceptor sites, or NNC if it is a novel isoform with at least one novel donor or acceptor site. Depictions of RNA isoform classifications can be found in Figure 2A. Potential artifacts such as reverse transcription jumps or intrapriming of intronic lariats were filtered out using the SQANTI2 filter script with an intrapriming rate of 0.6. Identification of fusion transcripts, intron retention, polyA motifs and proximity to CAGE peaks were defined based on SQANTI2 filtered isoforms. The occurrence of mutually exclusive exons (MX) and skipped exons (SE) were assessed using SUPPA2 (Trincado et al., 2018) with the parameter –f ioe, intron retention (IR) with SQANTI2, and alternative first exons (AF), alternative last exons (AL), alternative 5′ splice sites (A5), and alternative 3′ splice sites (A3) using custom scripts based on splice junction coordinates. Classification of isoforms as lncRNA (long non-coding RNA) was performed by using SQANTI2 in combination with GENCODE (human - v31, mouse - vM22) long non-coding RNA gene annotation. ORFs were predicted using the CPAT program (v3.0.2) using all default parameters and transcripts were predicted as protein-coding if the coding potential score was > = 0.364 for human and > 0.44 for mouse (Table S21).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.