We reanalyzed 118 Salinispora genomes (17) representing strains isolated from sponges and globally distributed marine sediments (96% of isolates). Previously, we assigned each strain to one of nine species based on genotypic and phenotypic characteristics (26) (Table S1 in the supplemental material). For each genome, protein-coding regions and gene annotations were assigned using Prokka v1.13.3 (63), and orthologs shared across all genomes were identified with Roary v3.12.0 (64) based on a minimum sequence identity of 85%. The resulting 2,106 potential orthologs were individually aligned using Clustal Omega (65) and screened for complete codon reading frames. The final 2,011 single-copy orthologs were concatenated to infer a core genome phylogeny using RAxML v8.2.10 (66) under the general time-reversal model with a gamma distribution for 100 replicates (Fig. 1A). Any orthologs not shared among all strains were assigned to the flexible genome. Species-specific flexible genes (defined as genes shared by all strains within a species but not observed in any other species) were assigned functional annotation with GhostKOALA against the nonredundant set of KEGG genes (67) for all Salinispora species with ≥3 genomes sequences.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.