2.3. Genome Assembly and Annotation

KM Konstantin V. Moiseenko
OG Olga A. Glazunova
NS Natalia V. Shakhova
OS Olga S. Savinova
DV Daria V. Vasina
TT Tatiana V. Tyazhelova
NP Nadezhda V. Psurtseva
TF Tatiana V. Fedorova
request Request a Protocol
ask Ask a question
Favorite

The shotgun sequencing produced 2 × 47,868,586 paired-end reads (2 × 100 bp), with an insert size of 300–500 bp. The reads were further processed with CLC Genomics Workbench 11.0 (Qiagen, Valencia, CA, USA) as follows: (1) adapters were removed from all reads; (2) all reads were trimmed based on their quality; (3) reads were sampled to reduce coverage to a maximum average coverage of 100×; (4) reads were de novo assembled and resulted contigs were scaffolded.

Genome structural and functional annotations were performed using Funannotate pipeline v1.5.0 (https://github.com/nextgenusfs/funannotate).

Structural annotation step included: (1) repeat masking with the RepeatMasker package (http://www.repeatmasker.org/) using the RepBase repeats libraries [27]; (2) ab initio protein-coding gene prediction with self-trained GeneMark-ES [28] and AUGUSTUS [29], trained using BUSCO 2.0 [30] gene models (Phanerochaete chrysosporium was selected as a closely-related species); (3) ab initio tRNA-coding gene prediction with tRNAscan-SE [31]; (4) integration and filtering of the obtained gene models.

Functional annotation was performed with the Pfam [32], InterPro [33], eggNOG [34], dbCAN [35], MEROPS [36], antiSMASH [37], and BUSCO [30] databases. The prediction of transmembrane topologies and signal peptides was performed with Phobius [38] and SignalP [39], respectively.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A