NAPA can build phylogeny-based networks from either a single phylogenetic tree or a Bayesian ensemble of trees sampled from the posterior probability distribution. Due to low-sequence divergence in the TEM and CTX-M-3 clusters, the coding DNA rather than protein sequence alignment was used as input for phylogeny inference and ancestral reconstruction. The phylogenetic trees in each protein cluster were outgroup rooted. A closely related protein cluster to the protein of interest was found by constructing an identity-based neighbor joining (NJ) tree from the alignment of the protein cluster with a set of homologous protein clusters. For TEM, the NJ trees included SHV and CARB-type β-lactamases; for CTX-M-3 and OXA-51-like the NJ tree included all CTX-M and OXA clusters, respectively. In this way, the SHV-1, CTX-M-14, and OXA-213 coding DNA sequences were identified as outgroups for the TEM, CTX-M-3, and CTX-M-51-like clusters, respectively.
The TEM phylogeny was inferred using the MrBayes Metropolis-coupled MCMC method (Huelsenbeck and Ronquist 2001), whereas the CTX-M-3 and OXA-51-like phylogenies were constructed with GARLi, a genetic algorithm for maximum likelihood inference method (Zwickl 2006). For each cluster, the input coding DNA sequence alignment was partitioned by codon position (Lanfear et al. 2012) and the generalized time reversible model of nucleotide substitution with gamma-distributed rates and a proportion of invariant (invariable) sites (GTR + G+I) was used. For the TEM cluster, six independent runs were performed in MrBayes, each for 30 million generations, including 15 million generation burn in. Trees were thinned to every 20,000th generation, to remove autocorrelation between phylogeny parameters. Burn-in and thinning parameters were determined from standard MCMC convergence diagnostics. The genetic algorithm run parameters for the CTX-M-3 and OXA-51-like clusters were population size of 4 individuals, selection intensity of 0.25, 2 million generations.
After phylogenies were reconstructed for each cluster, coding DNA sequences on the internal nodes of the trees (ancestral sequences) were inferred by maximum likelihood (Knight et al. 2007). The same nucleotide substitution model (GTR + G+I) was applied as the one used for phylogeny reconstruction. The leaf and reconstructed internal node sequences were translated to protein sequences and included in the input to NAPA, along with the phylogenetic trees (supplementary fig. 3, Supplementary Material online).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.