Body parts of single B. minax males were dissected to obtain DNA for whole-genome sequencing. Total DNA was extracted using the Magen HiPure Insect DNA Kits (D3129-02) (Guangzhou, China) according to the manufacturer’s instructions. Samples were sent to the GeneDenovo company (Guangzhou, China) (https://www.genedenovo.com/) for library construction and genome sequencing. Genome sequencing was carried out with the combination of Next-generation sequencing (NGS) using the Hiseq 2500 and third-generation sequencing adopting the Pacbio RSII. Briefly, five libraries were constructed, making two duplicated short fragment libraries (450 bp + 800 bp) and three long mate-pair libraries (2 kb + 5 kb + 10 kb), which produced in total 102G nucleotide bases (Supplementary file 1, Table S1-1) and covered an estimated 300 × of the genome size (Supplementary file 1, Figure S1-1). The genome size was estimated at about 331 Mb based on Kmer (k = 17) short fragmentary libraries analysis (Supplementary file 1, Figure S1-2). For third-generation sequencing, two libraries were constructed, five SMRT cells were sequenced, total 5G raw data were obtained, and the genome coverage reached 15 × (Supplementary file 1, Table S1-2); and SMRT analysis software (version 2.3.0) (https://www.pacb.com/products-and-services/analytical-software/smrt-analysis/) provided from Pacbio was used for the sequencing quality control. The genome assembly was divided into two steps: (1) Platanus (version1.2.1)35 was used to assemble Illumina data and GapCloser (v1.10)36 was utilized to extend the contig length; (2) PBjelly (PBSuite_15.8.24)37 was used to extend the scaffolds and fill the gaps by combining the third-generation sequencing data. Totally, we obtained a genome size of 340 Mb, which is close to the estimated genome size. The numbers of contigs/scaffolds were 38,509/8019 and the contig/scaffold N50 reached 23 kb/1.6 Mb (Supplementary file 1, Table S1-3). The genome sequences have been submitted to NCBI (SAYV00000000). The repetitive sequences account for 20.97% of the genome with using de novo prediction (RepeatModeler38 and LTR-FINDER39), RepBase40-based homology prediction (RepeatMasker41 and RepeatProteinMask42), and tandem repeats finder (TRF43) (Supplementary file 1, Table S1-4 and Table S1-5). Several methods of de novo prediction, homology-based gene prediction, and cDNA/EST prediction were used to predict gene structure after excluding the repetitive sequences (Supplementary file 1, Figure S1-3). Softwares from three systems (Augustus 2.744, Genscan 1.045, and Glimmer HMM 3.0.146) were used for de novo prediction of gene models. MAKER 2.28 software47 was applied to integrate all gene sets into a non-redundant and more complete one. Lastly, the BUSCO method48 was used to estimate the reliable degree of gene models, and the complete single-copy BUSCO scores reached 98.5% in the assembled gene set and 94.9% in the annotated gene set, respectively (Supplementary file 1, Table S1-6). After all these analyses, a set of 12,533 gene models was obtained and used for identification of chemosensory genes.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
 Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.