4.6. Bayesian Phylogenetic and Phylogeographical Reconstructions

EE Erika Ebranati
AM Alessandro Mancon
MA Martina Airoldi
SR Silvia Renica
RS Renata Shkjezi
PD Pranvera Dragusha
CV Carla Della Ventura
AC Anna Rita Ciccaglione
MC Massimo Ciccozzi
SB Silvia Bino
ET Elisabetta Tanzi
VM Valeria Micheli
ER Elisabetta Riva
MG Massimo Galli
GZ Gianguglielmo Zehender
ask Ask a question
Favorite

The main dataset and the subsets were aligned using the ClustalW software included in BioEdit [46], followed by manual editing (final alignment length = 211 nucleotide).

The JModelTest was used to select the simplest evolutionary model fitting the data, which was the GTR+I+G model of nucleotide substitutions for the main dataset and the Albanian subset, and the GTR+G model for the Italian subset.

The phylogenetic tree, model parameters, evolutionary rates and population growth were co-estimated using the Bayesian Markov chain Monte Carlo (MCMC) method implemented in BEAST v.1.8.4 [47]. Statistical support for specific clades was obtained by calculating the posterior probability of each monophyletic clade. Four simple parametric demographic models (constant population size, and exponential, expansion and logistic population growth) and a piecewise-constant Bayesian skyline plot (BSP) under both a strict and a relaxed (uncorrelated log-normal) clock were compared as coalescent priors [47].

The phylogeographical reconstruction was made using the continuous-time Markov Chain (MCC) process over discrete sampling locations implemented in BEAST [48], and the Bayesian Stochastic Search Variable Selection (BSSVS) model that allows diffusion rates to be zero with a positive prior probability. Comparison of the posterior and prior probabilities that the individual rates would be zero provided a formal Bayesian factor (BF) for testing the significance of the linkages between locations: rates with a BF of >3 were considered well supported and assumed to be the migration pathway.

The HCV-2 dataset and the Italian HCV-2c subset were investigated by running two independent MCMCs for 500 million generations, with sampling every 50,000 generations; the Albanian HCV-2c subset was investigated using 50 million generations, with sampling every 5000 generations. The data were combined using LogCombiner v. 1.80 in the BEAST package. Convergence was assessed on the basis of the effective sampling size (ESS) after a 10% burn-in using Tracer v. 1.5 software (http://tree.bio.ed.ac.uk/software/tracer/, accessed on 16 February 2021). Only ESS’s of ≥200 were accepted. Uncertainty in the estimates was indicated by 95% highest posterior density (95% HPD) intervals, and the best fitting models were selected using the BF and marginal likelihoods implemented in BEAST [49]; in accordance with Kass [50], only 2lnBF values of ≥6 were considered significant. The trees were summarized in a target tree using the Tree Annotator program included in the BEAST package and selecting the tree with the maximum product of posterior probabilities (maximum clade credibility) after a 10% burn-in. The estimates of the time of the most recent common ancestor (tMRCA) were expressed as the mean number of years and 95% HPD before the most recent sampling dates (which corresponded to 2016). The final trees were visualized using FigTree v. 1.4 (available at http://tree.bio.ed.ac.uk/software, accessed on 16 February 2021). In order to visualize diffusion rates over time, it is also possible to convert the location-annotated MCC tree to a GeoJSON data format suitable for viewing with georeferencing software and, using the new SPREAD3 analytical tool, the MCC tree was converted to a JavaScript object notation (JSON) file. The visualization was rendered using a Data Driven Document (D3) library [51].

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A