In order to test our method on realistic data for which the complete genealogical history is known, we simulated the evolution of recombining chromosomes using the coalescent simulator msms (Ewing and Hermisson 2010). Simulations involved four taxa, each represented by 10 haploid sequences (40 in total) that split in the order {[(A,B),C],D} (Figure 2A), or five taxa, each represented by six sequences, that split in the order {[(A,B),(C,D)],E} (Figure S7A in File S2), with split times of 0.5, 1, and 1.5 (in units of 4N generations). Population size was constant throughout. In all scenarios, unidirectional migration from C to B was simulated. The simulation was performed for a 1 Mb chromosome, with a population recombination rate (4Nr) of 0.01 or 0.001. Genealogies for each unique ancestry block (separated by recombinations) were recorded. These were used to calculate the true weightings using Twisst.
Tests on simulated chromosomes. (A) In all three demographic scenarios, populations split in the order {[(A,B),C],D}, at the split times indicated (in units of 4N generations), with migration from C to B (indicated by arrows). In the Neutral scenario, there is no selection and moderate migration. The Adaptive Introgression scenario is similar, except a beneficial allele at a locus in the center of the chromosome is allowed to move from population C into B at time 0.1. In the Barrier Locus scenario, the rate of migration is high, but an allele at the central locus that is fixed in C is selected against in population B. (B) Mean weightings for the three possible taxon topologies across the 1 Mb simulated chromosome. Note that we illustrate the topologies for the four taxa as rooted, with D as the outgroup for simplicity, but the rooting is not considered when computing the weightings. (C) Weightings for all three topologies plotted (stacked) across the chromosome, with loess smoothing (span = 20 kb). (D) Weightings for topology {[(A,B),C],D} inferred from simulated sequence data using nonoverlapping 50 SNP windows and neighbor joining. Solid blue lines indicate the true values, and dashed black lines indicate the inferred values. Gray shading indicates the lower (5%) and upper (95%) quantiles based on 100 bootstrap replicates. Values are smoothed as above.
Three distinct evolutionary scenarios were simulated (Figure 2A and Figure S7A in File S2). msms command options are provided in File S1. The first was a “Neutral” scenario, with no selection and a low migration rate from B to C of 0.1 (in units of 4Nm, where m is the fraction of B made up of migrants from C each generation). The second was an “Adaptive Introgression” scenario, which is the same as above except that a beneficial allele at a locus in the center of the 1 Mb chromosome is allowed to move from population C into B at time 0.1. This was achieved by initiating selection at this time point on a dominant allele that was fixed in population C and absent from the other populations. A selection coefficient of 0.005 was used for both the homozygote and heterozygote, with a diploid population size of 100,000, giving a selection strength (2Ns) of 1000 for both genotypes. The third was a “Barrier Locus” scenario, where the rate of migration from C to B was five (in units of 4Nm, as above), and a dominant allele at the central locus that is fixed in C is selected against in population B. The same selection coefficient and population size as above were used.
We simulated sequences from the simulated genealogies using seq-gen (Rambaut and Grass 1997). Command options are provided in File S1. The branch scaling factor for mutation was 0.01. Since branches in the simulated genealogies are in units of 4N generations, taking N to be 100,000 gives μ (the per generation mutation rate) of 2.5 × 10−8.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.