Population genomic analyses

LC Leonardo Campagna
MR Márcio Repenning
LS Luís Fábio Silveira
CF Carla Suertegaray Fontana
PT Pablo L. Tubaro
IL Irby J. Lovette
request Request a Protocol
ask Ask a question
Favorite

We searched for divergent areas of the genome by calculating FST values using VCFtools version 0.1.14 (50) and the five southern capuchino species with sample sizes of 12 individuals (hypox, mel, nig, pal, and pil). We calculated FST in three different ways across the 10 possible pairwise comparisons involving five species (for example, nig versus mel, hypox versus pal). We used three strategies: (i) calculated average FST values for nonoverlapping 25-kb windows, (ii) zoomed in to scaffolds of interest and calculated average FST values for nonoverlapping 5-kb windows, and (iii) calculated FST values for individual SNPs. We built Manhattan plots and conducted PCA in R version 3.3.0 (51) with the packages “qqman” and SNPRelate version 3.3 (52), respectively. The PCA derived from 11.5 million SNPs was run both with and without four outlier individuals (two S. melanogaster and two S. pileata; see details in fig. S1). Downstream analyses were conducted with and without these four individuals and produced similar results.

We identified divergence peaks in the 10 pairwise comparisons using the average FST value calculated for the nonoverlapping 25-kb windows, discarding regions with less than two windows and windows with less than 10 SNPs. We took a conservative approach and only selected regions that showed an FST value elevated above 0.2. Because the average FST across all comparisons was 0.008, these criteria only selected regions that fall between 12 and 13 SDs above the FST mean. We subsequently narrowed our selection of candidate regions by retaining only those that had at least one individual SNP with an FST of 0.85 or higher. We thus filtered out regions with an elevated average FST that did not contain individual outlier sites that could be putative targets of selection. We identified a total of 25 divergent regions across the 10 possible pairwise FST comparisons.

We estimated absolute sequence divergence by calculating the summary statistic Dxy for each site and obtaining an average for nonoverlapping 5-kb windows with a custom perl script. Dxy was calculated as the minor allele frequency in species A times the major allele frequency in species B plus the product of the major allele frequency in species A and the minor allele frequency in species B. The per-site minor allele frequency was obtained using AGSD version 0.911 (53).

We estimated LD using VCFtools to calculate the r2 statistic. The calculations were carried out with the 99 SNPs that showed fixed differences (FST = 1) in at least one pairwise comparison between species. We recorded the average and the highest r2 value when comparing more than one pair of sites between two peaks. Calculations were conducted for each species separately and for all taxa pooled together. For the former, we included one outgroup from each of the remaining species because, in many cases, the position was not variable within species and otherwise could not be used to calculate LD.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A