To generate DS1.1, we randomly extracted 500 sequences of 200-bp length from the hg38 build of the human genome. Barcode, megaprimer, and long terminal repeat (LTR) sequences were added at the beginning of each genomic sequence; linker cassette (LC) was added at the end of the sequence. We generated this dataset in FASTQ and FASTA formats, as VISA, HISAP, and QuickMap only accept FASTA input format. HISAP and QuickMap support at latest the hg19 build; therefore, genomic coordinates of the test dataset were converted to the hg38 build by UCSC liftover25 for comparison and adjusted for differences in sequences between hg19 and hg38 versions by manual inspection. Similarly, DS1.2 and DS1.4 were generated as mentioned above; however, sequences of 250-bp length were produced. DS1.5, DS1.6, DS1.7, and DS1.8 were generated with IS- as well as non-IS-containing sequences.
In the case of targeted sequencing mode, for DS2.1 we randomly extracted 5,600 sequences from the hg38 build of the human genome to generate the paired-end FASTQ dataset of 250-bp length. A vector sequence of 50 bp was added at the end of each sequence in one file. We additionally generated four in silico datasets (DS2.3, DS2.4, DS2.5, and DS2.6) by extracting 1,000 random sequences of 1 kb from the hg38 build. The vector was inserted at known genomic locations in the middle of each fragment. This library was used with a freely available program, profile-based Illumina pair-end Reads Simulator (pIRS),26 with default parameters to simulate four 100-bp sequencing datasets. Each dataset samples the library at four different coverage values: 1, 10, 100, and 1,000. DS2.3, DS2.4, DS2.5, and DS2.6 contain 37,200, 373,200, 3,735,000, and 37,350,000 sequences, respectively.
The analysis with all the Linux-based tools was performed on a 10.04 LTE Linux machine with 48 GB RAM and an eight-core Xeon central processing unit (CPU). The detail of criterion employed for statistical measure estimation is available in Section S4.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.