Identification of clusters of orthologous genes (COGs) in each taxonomic group was performed by an in-house python pipeline running a reciprocal local BLASTP alignment of all protein-coding genes of a genome against protein-coding genes of other genomes within the same taxonomic group. Pairs of genes showing a reciprocal sequence similarity with e values ≤0.0001 were considered orthologous.
All COGs were aligned using the MUSCLE algorithm.32 Alignment ambiguities were removed by the program Gblocks.33 Evolutionary distances between proteins were estimated by the Jones-Taylor-Thornton (JTT) substitution model implemented in the program protdist. For alignments of 16S rRNA sequences, the Felsenstein F84 substitution model implemented in the program dnadist of the PHYLIP package34 was used. Phylogenetic inferences were performed based on the JTT/F84 distance tables using the program neighbour of the PHYLIP package. Neighbour joining (NJ) trees were inferred for every COG including the alignments of 16S rRNA. Whole genome super-matrix (WGS) trees were inferred based on concatenated alignments of all COG translated into protein sequences (excluding 16S rRNA). Finally, 3 types of annotation- and alignment-free trees were calculated using whole genome sequence data. The OUP comparison was performed using the program LingvoCom 1.0 (http://www.bi.up.ac.za/SeqWord/lingvocom/index.html); phylogenomic inference by whole genome sequence alignment was executed by the program MAUVE 10.28 Finally, the CVTree alignment-free algorithm based on genome-scale oligo-protein k-string vector comparison was used to estimate phylogenomic distances between microorganisms.35
In addition, sets of artificial DNA sequences of 1 Mb simulating phylogenetic relationships were generated by the SimBac program31 to test the performance of the SWPhylo program.
Topologies of phylogenetic trees were compared using the Symmetric and Branch Score Distance (BSD) algorithms implemented in the program treedist of the PHYLIP package.36 The symmetric algorithm compares the topologies of trees only, whereas the BSD algorithm accounts for the branch lengths.34
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
 Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.