RNA-seq data of poly(A)+ RNA for the HepG2 cell line (accession numbers ENCFF670LIE and ENCFF074BOV) were downloaded in BAM format from the ENCODE Consortium website130. Short-hairpin shRNA-KD of 250 RBPs followed by RNA-seq data65 were downloaded in BAM format from ENCODE data repository130,131 (Table S3). Poly(A)+ RNA from wild-type Amr and Rpb1 C4/R749H mutant HEK293 cells treated with α-amanitin for 42 h were downloaded from the Gene Expression Omnibus (GSE63375)71. RNA-seq data of poly(A)+ RNA data from the A549 cell line treated with α-amanitin were obtained as explained below and mapped to the GRCh37 human genome assembly using STAR aligner v2.3.1z with the default settings132. The coordinates of circRNAs expressed in liver tissue and their associated SRPTM metrics (the number of circular reads per number of mapped reads per read length) were obtained from the TCSD database47. The genomic coordinates of adenine branch point nucleotides were selected from the validated set of branch points expressed in K562 cells42.

RNA-seq experiments were processed by IPSA pipeline to obtain split read counts supporting splice junctions133. Split read counts were filtered by the entropy content of the offset distribution, annotation status, and canonical GT/AG dinucleotides at splice sites with the default settings, and pooled between bioreplicates. The exon inclusion rate (Ψ, PSI, or Percent-Spliced-In) was calculated according to the equation

where inc is the number of reads supporting exon inclusion and exc is the number of reads supporting exon exclusion. Ψ values with the denominator <10 were considered unreliable and discarded. Differential exon inclusion between a pair of conditions (shRNA-KD vs. non-specific control and α-amanitin vs. untreated control) was assessed as described previously134.

Cryptic and actively expressed splices sites in the human transcriptome were identified using genomic alignments of RNA-seq samples from the GTEx Consortium43. Splice sites with the canonical GT/AG dinucleotides were called from split read alignments and ranked by the total number of supporting split reads pooled across all 8551 samples. The top 2% (respectively, bottom 2%) of splice sites among those supported by at least three split reads were referred to as active (respectively, inactive). In order to identify cryptic splice sites, we applied the same strategy as in ref. 135 by scanning the intron sequences for any sites that have a MaxEntScan score >800 for donor sites and >950 for acceptor sites136 and excluding splice sites that were detected in GTEx or present in the genome annotation. MaxEntScan score thresholds were chosen to have a comparable number of splice sites as in active and inactive sets above.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.