Before mapping, FASTX-Toolkit (v0.0.14) was used to preprocess the sequences: Adaptor sequences were removed, and reads were trimmed and filtered according to quality. The sequence alignment was performed by BWA-MEM (v0.7.12) with –M option to mark shorter split hits as secondary alignments and other parameters as default (84). Depending on the strain background, reads were mapped to either the W3110 genome [National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) Database accession: NC_007779.1] or the MG1655 genome (NCBI RefSeq accession: NC_000913.3). Reads that had multiple primary hits or low mapping quality were discarded. Potential PCR duplicates were removed by retaining only one pair of reads with the highest mapping quality when multiple read pairs were mapped to identical external coordinates (Picard Tool MarkDuplicates). BedGraph files that report the physical genomic coverage (taking into account the unsequenced part between read pairs) in each 2000-bp bin were generated from BAM files using deepTools (85). For ChIP-seq data, the read counts in the bedGraph files were normalized to the median coverage. For whole-genome sequencing data, the read counts in each bin were first normalized against total read counts, and then the log2 ratios of DSB and no-DSB samples were calculated. Plots were generated by R software. Genomic regions that contain ribosomal RNA gene clusters had very few uniquely mapped reads and were thus eliminated from the plots. All sequencing data are available in the European Nucleotide Archive (ENA) under study accession no. PRJEB14145.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.