Strain information, sequencing procedures, and detection of indels.

CT Charles C. Traverse
HO Howard Ochman
request Request a Protocol
ask Ask a question
Favorite

We assayed the transcriptomes of eight biological replicates of Escherichia coli MG1655 and two biological replicates of Buchnera aphidicola LSR1 by using the CirSeq library preparation protocol (15). In this method, mRNA is sheared into 80- to 100-bp fragments, which are then circularized, primed using random hexamers, and reversed transcribed to generate cDNA that contains multiple linked repeats of the mRNA fragment. cDNAs containing these repeats were sequenced using Illumina MiSeq 300-nt read lengths to capture at least three repeats within a sequencing read. Reads were processed by the CirSeq_v3 pipeline (http://andino.ucsf.edu/CirSeq) to generate a consensus sequence for each read (14). All settings used in CirSeq_v3 were the default settings, with a quality score cutoff of 20. CirSeq_v3 uses Bowtie 2 (37) to align reads to a reference genome (NC_000913.3 for E. coli and NZ_ACFK01000001 for Buchnera). Additionally, we edited the run.sh script to retain the intermediate output (9_alignment.sam and 10_alignment.sam) generated in the CirSeq_v3 pipeline, since these outputs contain candidate insertions and deletions. Additional strain information and library preparation protocols have been described elsewhere (4). The data are publicly available from the NCBI Sequence Read Archive (SRA) [see “Accession number(s),” below]. (The insertions and deletions used in our analyses are provided as Data Sets in the supplemental material.)

By generating a consensus sequence from the multiple repeats within a single read, sequencing errors, which appear as changes in only one of the repeats, are omitted. Insertion and deletion rates of Illumina sequencing are very low (38), and only those insertions or deletions that occurred at identical positions and are of equal size in fully aligned repeats were considered authentic. Because sequencing reads originate from the reverse transcription of circularized mRNA fragments primed with random hexamers, the actual orientation of sequences can only be determined after multiple rounds of sequence alignment. This process generates many intermediate alignment files (9_alignment.sam and 10_alignment.sam) that contain improperly mapped reads, and to detect insertions and deletions, we searched these files to identify reads that contained indels flanked on both sides by fully aligned sequences. One strategy for determining the correct orientation of a read in the CirSeq_v3 pipeline was to sequentially move each base from one end of the read to the other (14). By mapping each iteration to the genome, many reads that initially contained insertions or deletions eventually yielded an aligned sequence devoid of indels. To identify insertions and deletions, we retained those reads that contained the highest alignment score within each iteration of a read while also containing an insertion or deletion. Finally, only those insertions receiving quality scores of ≥20 and only those deletions that were flanked on both sides by bases receiving quality scores of ≥20 were considered. Additionally, we sequenced the genome of the parental strain of E. coli to confirm that no errors were attributable to genomic mutations. Statistical analyses were performed with Prism GraphPad and R.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A