Inference of Migration Events

FM Francesco Mercati
GL Gabriella De Lorenzis
AM Antonio Mauceri
MZ Marcello Zerbo
LB Lucio Brancadoro
CD Claudio D'Onofrio
CM Caterina Morcia
MB Maria Gabriella Barbagallo
CB Cristina Bignami
MG Massimo Gardiman
LP Laura de Palma
PR Paola Ruffa
VN Vittorino Novello
MC Manna Crespan
FS Francesco Sunseri
request Request a Protocol
ask Ask a question
Favorite

The maximum likelihood (ML)-tree of data collected and a gene flow model among the geographic groups (Table 1) were developed by TreeMix (Pickrell and Pritchard, 2012). Stratified allele frequencies from PLINK were converted into the TreeMix format using the plink2treemix.py script and used as input (https://speciationgenomics.github.io/Treemix/; Pickrell and Pritchard, 2012). Forty independent ML searches following the procedure described by Zecca et al. (2020) were performed. The results were filtered based on their likelihood values using the R/cfTrees (Zecca et al., 2020), duplicates were deleted, and the best-scoring ML tree was used. The gene flow model among the groups was investigated through migration events (m). Migration edges were tested 10 times from 1 to 5 with different random seeds each time and using blocks (k) of 20 SNPs, to check for convergence in terms of the likelihood value of each model, and the variance explained in each migration event was added. Standard errors (SE) and bootstrap replicates (bootstrap) were used to assess the confidence in the inferred tree topology and the weight of migration events. To automate the choice of the best migration event, an ad hoc statistic based on the second-order rate of change in the likelihood weighted by SD was adopted through R/OptM (Fitak, 2019). However, since the true model was considered when the migration edges (m) explained 99.8% of the variance in ancestry between groups, only the model showing this cutoff was believed to be the best one (Pickrell and Pritchard, 2012). In addition, only the runs with all statistically significant incorporated migration edges were considered. The residuals from the fitted models chosen for our data were visualized using the R script plot_resid.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A