Running fastsimcoal2

PJ Parul Johri
KR Kellen Riall
HB Hannes Becher
LE Laurent Excoffier
BC Brian Charlesworth
JJ Jeffrey D. Jensen
request Request a Protocol
ask Ask a question
Favorite

Inference was performed by masking all exonic SNPs and using all intronic and intergenic SNPs in order to obtain the most accurate estimates. In order to minimize the effects of linkage disequilibrium (LD), SNPs separated by 5 or 100 kb were also used for inference in some cases to assess the impact of violating the assumption of independence. When choosing SNPs separated by a particular distance, the first SNP from each chromosome was chosen and if the distance to the next consecutive SNP was greater than or equal to 5 kb/100 kb, that SNP was included, otherwise the next downstream SNP was evaluated. Site frequency spectra (SFS) were obtained for all sets of SNPs for all ten replicates of every combination of demographic history and DFE. SNPs from all 22 chromosomes were pooled together to calculate the SFS. In the case of SNPs separated by 5 kb/100 kb, the “0” class of the SFS was scaled down by the same extent as the decrease in the total number of SNPs. Fastsimcoal2 was used to fit each SFS to four distinct models: (a) equilibrium, which estimates only a single population size parameter (N); (b) instantaneous size change (decline/growth), which fits three parameters—ancestral population size (Nanc), current population size (Ncur), and time of change (T); (c) exponential size change (decline/growth), which also estimates three parameters—Nanc, Ncur, and T; and (d) an instantaneous bottleneck model with three parameters—Nanc, intensity, and time of bottleneck. The parameter search ranges for both ancestral and current population sizes in all cases were specified to be uniformly distributed between 100 and 500,000 individuals, whereas the parameter range for time of change was specified to be uniform between 100 and 10,000 generations in all models. The intensity of the bottleneck was sampled from a log-uniform distribution between 10-5 and 2. The following command line was used to run fastsimcoal2:

fsc26 -t demographic_model.tpl -n 150000 -d -e demographic_model.est -M -L 50 -q.

Model selection was performed as recommended by Excoffier et al. (2013). For each demographic model, the maximum of maximum likelihoods from all replicates was used to calculate the Akaike Information Criterion (AIC) = 2 × number of parameters—2 × ln(likelihood) = 2 × number of parameters—2 × ln(10) × L10, where L10 is the logarithm (with respect to base 10) of the best likelihood provided by fastsimcoal2. For model choice comparison, we also implemented a stricter penalty of 25× (see supplementary tables 5 and 6, Supplementary Material online), in which case AIC = 25 × number of parameters—2 × ln(likelihood). The relative likelihoods (Akaike’s weight of evidence) in favor of the ith model were then calculated as:

where Δi =AICi-AICmin. The model with the highest relative likelihood was selected as the best model, and the parameters estimated using that model were used to plot the final inferred demography.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A