To estimate the strength of purifying selection, we used the ratio of nonsynonymous and synonymous substitutions (dN/dS). When considering values <1, lower values are a sign of higher purifying selection while higher values are a sign of higher genetic drift (low purifying selection). To calculate genome-wide dN/dS ratios, we used two sets of conserved marker genes that would be expected to be found in most genomes. The first one consists of 120 phylogenetic marker genes that are highly conserved in Bacteria, which we also used for phylogenetic reconstruction (Parks et al. 2015). The second set consists of 40 phylogenetic marker genes used in phylogenetic reconstructions, which we refer to as the EMBL set due to its development in the European Molecular Biology Laboratory (Sunagawa et al. 2013).
For both marker gene sets, we predicted proteins from each genome using Prodigal and then annotated the marker genes of interest using the hmmsearch tool of HMMER3 with model-specific cutoffs. We aligned the amino acid sequences for each annotated gene coming from Marinimicrobia genomes separately using ClustalOmega, and the resulting alignments converted into codon alignments using PAL2NAL (Suyama et al. 2006). Maximum-likelihood approximation (codeML) within the PAML 4.9 h package (Yang 2007) was used through Biopython in order to perform dN/dS pairwise comparisons within the clades previously established (Getz et al. 2018). We removed dN/dS values with dS ≥1, which implies that synonymous substitutions are near saturation. Moreover, to avoid comparing sequences from genomes that may be part of the same population, we also excluded comparisons for which dN = 0 and dS ≤0.01. Additionally, we discarded all dN/dS values ≥10 on the grounds that these were largely artifactual. Lastly, because we wished to compare dN/dS values from Marinimicrobia that reside in different habitats, we only included dN/dS values where the pair of compared genomes were from the same habitat (epipelagic and mesopelagic). All values used can be found in supplementary data set S1, Supplementary Material online.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.