Identification of pause events
Starting data: mRNA sequences of all the genes in a genome, and the ribosome occupancy mapped to each mRNA nucleotide from a Ribosome Profiling experiment. (This typically represents the occupancy of the mRNA nucleotide in the A-site).
A. Calculating parameters of ribosome occupancy in the codons of every gene.
1. For each gene, calculate the median nucleotide ribosome density.
2. Choose a threshold to discard genes whose ribosome densities are insufficient for analysis. For example, genes with a median density < 0.2 can be discarded from further analysis.
3. Normalize the ribosome density to the gene’s translation level, by dividing the nucleotide density by the median density of the gene.
4. Calculate Normalized Codon Density, by averaging the normalized nucleotide density of each 3 nucleotides in every codon.
B. Defining internal Pause Codons.
1. For each gene, discard the normalized density values of the first and last 4 codons.
2. Collect the values of all other normalized codon densities.
3. The top 5% (or any other threshold) may be defined as codons where the ribosome pauses. Any codons with density values above that may be considered Pause Codons.
C. Defining the Pause Codons which have an upstream Shine-Dalgarno-like sequence
(use the attached tenmer table, which is a table specifying the binding energy of all possible 10-mers in mRNAs to the E. coli Anti-Shine-Dalgarno, calculated using the Vienna Package’s Subopt program).
1. For each nucleotide in the mRNA, take a 10-nucleotide sequence, starting from 7 nucleotides upstream and extending to 2 bases downstream.
2. Find the delta-G value for binding of this 10-mer in the table. This is the calculated binding energy to the anti-Shine-Dalgarno at the above position.
3. Similar to the analysis above, define an energetic threshold for considering a site as an internal, Shine-Dalgarno-like sequence, by pooling together the binding energies of all nucleotides in the genome’s mRNAs. The bottom 5% may be defined as Shine-Dalgarno-like sites.
4. Finally, for every Pause Codon in the genome, scan the mRNA sequence to see if there is a Shine-Dalgarno-like site 8-11 nucleotides upstream of any of the codon’s nucleotides. These codons may be considered ‘programmed pauses’, where the Shine-Dalgarno-like mRNA sequences may drive ribosome slowdown.