# Also in the Article

3.1.1. Migration simulations
This protocol is extracted from research article:
Can Bayesian phylogeography reconstruct migrations and expansions in linguistic evolution?
R Soc Open Sci, Jan 13, 2021;

Procedure

We model migrations as directional random walks, where languages are represented by points in space. These points move stochastically with a bias in one direction. The direction and strength of this bias is controlled by the parameter μ. Examples for different levels of μ are displayed in figure 2. Other parameters of the MigSim are T, the total time span of the simulation (from the first split to the current time); N the expected number of leaves (i.e. sampled languages) in the tree; and σ, the expected distance covered due to undirected movements over the whole expansion period. Here, we set these parameters to values that seem realistic for the expansion of a language family (T = 5000 years, N = 100 nodes and σ = 2000 km). However, we want to emphasize that the exact values do not matter for our findings. The results show more generally how the reconstruction quality changes when increasing μ, for a fixed σ. A sensitivity analysis on the number of nodes shows that varying the tree size does not change our findings (see electronic supplementary material, S6).

Examples of three simulations and corresponding reconstructions. The top row shows the simulated trees plotted in space with the root marked by a blue star. In the bottom row, the reconstructed tree can be seen with the root in red. The columns represent different levels of directional trend in the simulation. The trend increases from left to right (μ = 0, σ, 2σ).

We implement directional random walks in discrete time steps of duration Δt (in our experiments set to 1 year). In every time step, each language makes a random move according to a Gaussian distribution

The free parameters in this process are the step mean μstep and the step variance $σstep2$. Since these steps are arbitrary units of our simulation, we aggregate them into more meaningful quantities: the total bias μ and total standard deviation (or total expected diffusion distance) σ, defined as

and

This gives rise to a reformulated step distribution

At the same time, each language has a certain probability to split into two new languages, which then continue to undergo independent random walks. In order to simulate historical data from extinct languages (fossils), each language has a certain probability to go extinct. This is a common birth–death process, which is controlled by the birth rate λ and death rate ν. We set ν = lnN/4T = 0.00023 and λ = 5ν = 0.00115, which after T = 5000 years results in an expected number of languages of N = 100. Including a death rate in the model allows us to use extinct languages in some of the experiment settings (see electronic supplementary material, S1). To ensure comparability between the scenarios, we remove outliers from the simulation results. We consider a result as an outlier if the number of extant languages is below 40 or above 200.

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A