The selection of an appropriate alignment tool for CAGE-Seq data can be a difficult due to their short read length. Therefore, we generated simulated single-end sequence datasets with read lengths of 27 bp similar to the length of the trimmed CAGE-Seq data and compared alignment quality of BWA and Bowtie252 (version 2/2.3.4.3). Simulated datasets were generated from chr1 of the Bos taurus genome (GenBank: ARS-UCD1.2) using dwgsim53. The default per base sequencing error rate of 0.02 was considered. Three datasets, each comprised of 20 samples, were generated with average sequencing depth of 10–25x (high), 5–10x (medium), and 1–5x (low). The sequencing coverage of each sample for each datasets was chosen based on random distribution within the coverage bounds. All simulated reads were mapped to chr1 of the Bos taurus genome assembly.
The parameters used for running BWA was the same as the parameters used in real data. For Bowtie2 the default parameters were used. Two standard performance measures, precision, and recall were used to evaluate the aligners. Recall (sensitivity) indicates the number of correctly aligned reads over the total number of reads that should have been aligned, and precision shows the number of correctly aligned reads over the total number of aligned reads. The measures were calculated using the dwgsim_eval program dwgsim53. To assess the overall performance of the two aligners, the area under the precision-recall curve (PR-AUC) was computed. PR-AUC ranged between 0 and 1 with larger area indicating better performance. Overall scoring of the mappers based on our evaluation criteria was slightly higher for BWA compared to Bowtie2 (0.41 ± 0.0 vs. 0.32 ± 0.0008), indicating the higher accuracy using BWA with respect to sequencing parameters used.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.