Although descriptive analyses of genetic variation and phylogenies are useful to identify patterns and compare hypotheses, Approximate Bayesian Computation (ABC) allows for the quantitative comparisons of alternative scenarios via simulation and estimation of the posterior distributions of important parameters. Such simulation approaches, which rely on implied assumptions of the many parameters, are therefore valuable when used in conjunction with other methods that do not rely on highly parameterized models. Thus, to better understand the evolutionary history of A. k. occidentalis and A. nimapuna, we performed coalescent simulations in an ABC framework using DIYABC v.1.0.4.46 (Cornuet et al., 2008), simulating many thousands of genealogies and retaining those simulations that produced genetic variation patterns close to the empirical data which were then used to discriminate among a set of alternative historical scenarios.
For A. k. occidentalis, we partitioned the data into five clusters that were recovered by the BAPS analysis and compared six alternative scenarios that considered divergence and population size variation (Fig. 1A). The A. nimapuna data were partitioned into two clusters that were recovered by BAPS and we compared five alternative scenarios (Fig. 1B) that considered population size variation. The similarity between the simulations and the empirical data was measured using both within- and between-population summary statistics, including number of segregating sites, mean pairwise differences, Tajima’s D, and private segregating sites for a single population, and number of segregating sites, mean pairwise differences within, and mean pairwise differences between pairs of populations. For each species, 100 000 simulated data sets were generated for each scenario to build a reference under a mutation model with mean rate ranging from 1.00 × 10-09 to 1.00 × 10-07 and uniform prior distribution. A pre-evaluation step based on a principal component analysis (PCA) was performed to ensure scenarios and priors produced simulated data sets similar enough to the empirical data. The relative posterior probabilities of the competing scenarios were estimated via logistic regression on the 10% of simulated data sets closest to the empirical (Cornuet et al., 2010). The model with the highest posterior probability was considered the best model.
Simulated historical scenarios tested in DIYABC for (A) five A. k. occidentalis BAPS clusters and (B) two A. nimapuna BAPS clusters. In these scenarios, t represents timescale in terms of the number of generations and width of the graph represents relative effective population size during the time period (e.g. 0 – t1).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.