RRTS calculation (Ribosome ReadThrough Score)
Data analysis is performed in python 2.7
All code is available here: https://github.com/jrw24/G418_readthrough
For RRTS specific details, see the python script here: https://github.com/jrw24/G418_readthrough/blob/master/riboseq/riboseq_buildDenseTables_rpkm_utr3adj.py
Step-by-step description of RRTS calculation:
- Prior to performing this analysis, footprint assignment must be completed meaning that each ribosome protected fragment (RPF) has been mapped to a transcript specific position as a function of the fragment length and counts have been normalized by sequencing depth.
- Each transcript is saved as a list with each position representing a nucleotide
- The variable name is: exonsplicedcounts
- For example, a 9 nucleotide transcript (x) with 3 ribosomes could look like:
x = [ 0, 0, 0, 0, 2, 0, 0, 1, 0] - A key step in the protocol is trimming regions of each transcript. To trim the last 3 nucleotides of this transcript, you could use:
x = x[:-3]
Now this transcript has a length of 6 and 2 total counts
[ 0, 0, 0, 0, 2, 0 ] - I set a dictionary with insets (nucleotides to be trimmed from each region)
defaultInsets = { 'utr5Inset3' : 6, 'cdsInset5' : 18, 'cdsInset3' : 15, 'utr3Inset5' : 6 }
- Iterate through all transcripts in the transcriptome annotation and perform the following steps:
- Remove transcripts with 5’UTR length of zero
- Remove transcripts with 3’UTR length of zero
- Define the CDS start and end:
cdsstart = utr5len
cdsend = len(exonsplicedcounts) - utr3len
- Verify transcripts have a CDS length greater than zero
- Modify 5’UTR, 3’UTR, CDS and mRNA region lengths to exclude start codon and stop codon peaks from calculation.
- utr5len = utr5len – 6
- cdslen = cdslen – 18 – 15
- utr3len = utr3len – 6
- mrnalen = utr5len + cdslen +utr3len
- calculate adjusted 3’UTR length (utr3LenAdj) based on the position of the first inframe stop codon
- inframe stop codon positions are identified using this script:
https://github.com/jrw24/G418_readthrough/blob/master/utils/stopcodon_finder.py - If there are no in-frame stop codons in the 3’UTR, set utr3LenAdj to utr3len
- Filter zero length 5’UTRs and 3’UTRs after trimming
- Sum RPF counts in each region, excluding counts in trimmed positions
utr5Counts = sum(exonsplicedcounts[:cdsstart-insets['utr5Inset3']])
utr3Counts = sum(exonsplicedcounts[cdsend+insets['utr3Inset5']:])
cdsCounts = sum(exonsplicedcounts[cdsstart+insets['cdsInset5']:cdsend-insets['cdsInset3']])
mrnaCounts = utr5Counts+cdsCounts+utr3Counts
- Sum counts in adjusted 3’UTRs excluding the first 6 nucleotides after the normal stop codon.
utr3AdjCounts = sum( exonsplicedcounts[cdsend+insets['utr3Inset5']:cdsend+utr3LenAdj] )
- If the adjusted 3’UTR length (utr3LenAdj-6) is less than 12, exclude this transcript
- Calculate densities for each region by dividing counts over length
- Convert densities to rpkm by multiplying by 1000 (reads per kilobase per million of mapped reads)
- Exclude transcript with zero CDS densities
- Exclude transcript if there are less then 128 counts in trimmed CDS region
- Exclude transcripts with CDS densities less than 0.001
- Calculate RRTS by dividing 3’UTR adjusted density by CDS density
- RRTS = utr3AdjDensity_rpkm/cdsDensity_rpkm
- After completing calculation for all transcripts, write results to a csv file.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this
article to respond.