Analysis of DREADS/SHREADS sequence data

Zarmik Moqtaderi; Joseph V. Geisberg; Kevin Struhl

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Analysis of DREADS/SHREADS sequence data

ZM Zarmik Moqtaderi

JG Joseph V. Geisberg

KS Kevin Struhl

This method is extracted from research article: Mol Cell, Oct 2018

Extensive structural differences of closely related 3’ mRNA isoforms: links to Pab1 binding and mRNA stability

DOI: 10.1016/j.molcel.2018.08.044

Request a Protocol

Ask a question

Favorite

De-multiplexed individual libraries were analyzed as follows: Before mapping, for the R1 end of the paired sequence reads, the four random nucleotides corresponding to those added during library construction were removed from the beginning of the sequence. (Due to ligation bias, we observed strong preferences in the distribution of the 4-mers ligated to the enzymatically shortened poly(A) sequence at R1 ends; these distributions were essentially identical in the no-DMS and DMS-treated samples). Any initial T residues at the 5’ end of the sequence read (corresponding to the poly(A) tail and any genomically encoded A residues at the 3’ end of the RNA) were first counted and then removed. An integer corresponding to the number of initial T nucleotides was appended to both the R1 and R2 read identifiers for future reference. Reads lacking initial T nucleotides, reads with ambiguous bases, and reads with fewer than 9 nt remaining after removal of Ts were excluded from further analysis. For all others, the first 17 nt were used for mapping, to maximize unique mapping while minimizing overlap with the primer on the far side of short sequences. For the R2 member of the read pair, the sequences corresponding to the four possible 4-mers were removed, and the sequence from the second nt to the 18^th were retained for mapping. In our experience, the first nt, representing the final nt added by reverse transcriptase during library construction, is frequently a mismatch in DMS-treated samples; we therefore used the following nt as the starting point for mapping and compensated for this shift after mapping. Mapping of the remaining paired sequence reads to a mixed reference genome consisting of the S. cerevisiae and S. pombe genomes was performed using bowtie (Langmead et al., 2009), accepting unique genomic matches only and allowing no more than one mismatched nt. Next, uniquely mapped pairs were screened as follows to remove any that were not demonstrably from poly(A) addition sites: we discarded any R1 read that did not have more initial Ts than the number of genomically encoded As at the corresponding position. The 5’ (R2) boundary for each mapped pair was adjusted backwards two nt: one to compensate for our having started the mapping from the second nt, and another to identify the potentially DMS-modified position. The total number of completely processed reads was scaled by a small multiplier so as to give equal numbers of reads in DMS and non-DMS samples. This scaling factor was not large; for example, for the two replicates of the main data sets, the initial un-scaled poly(A)-RNA-derived read counts for the DMS sample and the untreated control differed by less than 10%.

For every gene, we thus identified all 3’ RNA endpoints in the sequenced library, and for each of these 3’ endpoints, we tabulated all corresponding paired 5’ fragment ends representing the endpoints of the reverse transcription reactions. Frequencies of these associated 5’ endpoints for untreated and DMS-treated cells were compared to identify positions with DMS-induced modifications capable of inhibiting the progress of reverse transcriptase.

For D. hansenii native and JYAC7 (containing the D. hansenii YAC) strains, mapping was performed using bowtie to a combined reference file consisting of the D. hansenii Deha2, S. cerevisiae SacCer3, and S. pombe EF2 genomes. For K. lactis native and JYAC2 (containing the K. lactis YAC) strains, mapping was performed using bowtie to a combined reference file consisting of the K. lactis Klla0, S. cerevisiae SacCer3, and S. pombe EF2 genomes. To eliminate any ambiguity arising from possible cross-species-matching of homologous genes in the YAC strains, only positions mapping uniquely over the entire combined reference genome were considered. All subsequent data preparation and normalization steps were performed as described above.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol