Briefly, gene models were predicted using EuGene (release 4.1; Transcript information was provided to the gene caller as follows: one PE Illumina RNA sequencing (RNA-seq) dataset was assembled into transcript contigs using a De Bruijn Graph (DBG) assembly approach with high Kmer sizes to boost specificity during the assembly. Six O. mediterraneus strains (table S2) had their transcriptome sequenced as part of the MMETSP. The resulting transcriptomes are referenced in table S2 with their respective MMETSP identifiers.

The obtained set of contigs was cleaned from any contigs smaller than 300 bp, completed with expressed sequence tag sequences collected from National Center for Biotechnology Information (>300 bp) and mapped on the genomic scaffolds by combining BLASTN (for rapid detection on regions of interest) and GenomeThreader (for fine mapping, taking splice sites into account) with parameters set to map at least 98% of the transcript sequence on the regions as reported by BLASTN. Besides, RNA-seq was also mapped directly onto the genome using HISAT2, and junctions were extracted from the BAM alignments with RegTools. The obtained junction instances were counted and grouped on the basis of the intron coordinates that was spanned by a read. This count (or coverage) was used to filter for junctions confirmed by a minimum of three reads. Functional descriptions were inferred from filtered best BLAST hits and combined with protein domains obtained with InterPro. This automatic annotation was curated using the Online Resource for Community Annotation of Eukaryotes (ORCAE) portal ( All predicted genes were manually inspected, and from the 8483 protein-coding genes, 1142 have been modified (including gene fusions) and 59 were discarded, resulting in a final set of 8110 protein-coding genes. The functional description was manually edited for 662 genes.

Gene prediction in OmV2 was performed using Glimmer implemented within Geneious. Functional description of predicted OmV2 genes was done using BLASTX against the non-redundant (NR) database (from December 2017) with default parameters and against the Pfam version 31.0 database ( tRNAs were predicted using the tRNAscan-SE 2.0 web server.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.