3.1.1. Migration simulations

This protocol is extracted from research article:

Can Bayesian phylogeography reconstruct migrations and expansions in linguistic evolution?

**
R Soc Open Sci**,
Jan 13, 2021;
DOI:
10.1098/rsos.201079

Can Bayesian phylogeography reconstruct migrations and expansions in linguistic evolution?

DOI:
10.1098/rsos.201079

Procedure

We model migrations as directional random walks, where languages are represented by points in space. These points move stochastically with a bias in one direction. The direction and strength of this bias is controlled by the parameter *μ*. Examples for different levels of *μ* are displayed in figure 2. Other parameters of the MigSim are *T*, the total time span of the simulation (from the first split to the current time); *N* the expected number of leaves (i.e. sampled languages) in the tree; and *σ*, the expected distance covered due to undirected movements over the whole expansion period. Here, we set these parameters to values that seem realistic for the expansion of a language family (*T* = 5000 years, *N* = 100 nodes and *σ* = 2000 km). However, we want to emphasize that the exact values do not matter for our findings. The results show more generally how the reconstruction quality changes when increasing *μ*, for a fixed *σ*. A sensitivity analysis on the number of nodes shows that varying the tree size does not change our findings (see electronic supplementary material, S6).

Examples of three simulations and corresponding reconstructions. The top row shows the simulated trees plotted in space with the root marked by a blue star. In the bottom row, the reconstructed tree can be seen with the root in red. The columns represent different levels of directional trend in the simulation. The trend increases from left to right (*μ* = 0, *σ*, 2*σ*).

We implement directional random walks in discrete time steps of duration Δ_{t} (in our experiments set to 1 year). In every time step, each language makes a random move according to a Gaussian distribution

The free parameters in this process are the step mean *μ*_{step} and the step variance ${\sigma}_{\mathrm{step}}^{2}$. Since these steps are arbitrary units of our simulation, we aggregate them into more meaningful quantities: the total bias *μ* and total standard deviation (or total expected diffusion distance) *σ*, defined as

and

This gives rise to a reformulated step distribution

At the same time, each language has a certain probability to split into two new languages, which then continue to undergo independent random walks. In order to simulate historical data from extinct languages (fossils), each language has a certain probability to go extinct. This is a common birth–death process, which is controlled by the birth rate *λ* and death rate *ν*. We set *ν* = ln*N*/4*T* = 0.00023 and *λ* = 5*ν* = 0.00115, which after *T* = 5000 years results in an expected number of languages of *N* = 100. Including a death rate in the model allows us to use extinct languages in some of the experiment settings (see electronic supplementary material, S1). To ensure comparability between the scenarios, we remove outliers from the simulation results. We consider a result as an outlier if the number of extant languages is below 40 or above 200.

Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A

Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.