Phylogenetic analysis was conducted on the mitogenome dataset to explore the evolutionary relationships and estimate Time of Most Recent Common Ancestor (TMRCA). First, to investigate the overall evolutionary relationships, a Neighbour joining (NJ) phylogeny was built in MEGA v 6.06 (Tamura et al., 2013) using the TrN substitution model and 1000 bootstraps. Then, Bayesian phylogenetic analysis and estimation of TMRCA was performed using BEAST v1.8.4 (Drummond & Rambaut, 2007). A concatenated dataset including all 13 mitochondrial coding genes was analyzed. The sequences were further divided into the three codon positions to allow separate estimation of mutation rate. The best substitution model was chosen based on a maximum likelihood search conducted in MEGA. Initial analysis found the HKY + G + I model to constitute the most accurate substitution model for the dataset. The final analysis was based on the simpler HKY model due to problems with MCMC convergence in BEAST when using the HKY + G + I model. However, the phylogeny and the estimated TMRCA of all major groups were extremely similar between analyses supporting a limited effect of the substitution models. Initial phylogenetic analysis showed the presence of three major clades. These may have different demographic histories, which could affect divergence estimates if not accounted for (Ho et al., 2008). Thus, to allow differences in population size changes, the final analysis was conducted in *BEAST (Heled & Drummond, 2010). This program uses two different tree priors: a speciation prior for between‐clade branch patterns and a coalescence prior for the within‐clade branch patterns, which allows inferences of different demographic history between lineages. Piecewise linear and constant root, piecewise linear and piecewise constant change models of effective population size were used as coalescence priors to compare the robustness of the divergence estimates given differences in demographic history. The YULE prior was used as the between‐clade prior. To estimate the best set of models, bayes factor calculation, as proposed by Suchard et al. (2001), was conducted in BEAST. All phylogenetic analyses used a constant clock as a preliminary test in BEAST showed no significant rate heterogeneity among branches. Both the slow and the fast mutation rates were used to estimate divergence time in years.
The final MCMC samples were based on a run for 50,000,000 generations, and genealogies were sampled every 5000 generations with 10% discarded as burn‐in. Examination of convergence and effective sample size (ESS) values was conducted using TRACER v1.5 (Drummond & Rambaut, 2007). All parameters had ESS values >200, and additional runs gave similar results. The maximum clade credibility tree with mean heights for branches was estimated in the program TREEANNOTATOR (Drummond & Rambaut, 2007) with 10% burn‐in and visualized and edited in the program FIGTREE v1.3.1 (Andrew Rambaut, University of Edinburgh, http://tree.bio.ed.ac.uk/software/figtree/).
Divergence time between the different geographical groups was estimated using IMa software (Hey & Nielsen, 2007) using the full mitogenomes. Only groups with more than five individuals were analyzed to avoid spurious results due to low sample size. IMa infer divergence time between two populations from a common ancestral population. It assumes constant population sizes and no intragenic recombination, which is not occurring in mtDNA. A model with isolation without migration (m=0) was chosen to mimic founding of the Danish populations. Population sizes (q) were set to 300 and divergence time, t, to 20. A geometric heating scheme with 15 MCMC chains with the parameters, g1 = 0.7, g2 = 0.9, L = 3.0 and a burn‐in = 30,000 was used to explore the parameter space. Convergence was evaluated by ESS and performing two runs for each pairwise estimation of divergence time showing similar results. The dataset was analyzed using the HKY substitution model and an inheritance scalar of 0.25 to account for the lower effective size of mtDNA compared with nuclear DNA. As in *BEAST, the fast and slow mutation rates were used for divergence time estimation in years.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.