Tests of linkage and neutrality

JW Janna R. Willoughby
JI Jamie A. Ivy
RL Robert C. Lacy
JD Jacqueline M. Doyle
JD J. Andrew DeWoody
request Request a Protocol
ask Ask a question
Favorite

In order to obtain unbiased estimates of population GD, we needed to ensure that our SNPs were independent. Using the ‘ld’ function in the snpStats package [21], we estimated D’ and grouped SNPs that had pairwise D’ estimates > 0.8. From these putative linkage groups, we then eliminated any SNP that occurred in more than one group. This process resulted in two sets of SNPs, singletons and those assembled into putative linkage groups, which were ultimately analyzed collectively by randomly selecting a single SNP from within the linkage groups and permuting our statistical models.

We also sought to identify SNPs under selection, and did so using simulations in R. We modified a computer program designed to mimic the breeder-selection protocols (i.e. MK, RAN, and DOC) to use SNP data [16] and simulated the expected change in allele frequencies for each SNP locus over 20 generations assuming no selection. We could not generate SNP genotypes for the original founders because of DNA degradation, so we began all of our simulations with the genotypes of founder offspring. Briefly, our program works by drawing on the data available from the actual breeding programs. At each generation, we selected breeders following the captive breeding protocols. In MK populations breeders were selected by minimizing mean kinship within the population whereas in RAN populations breeders were chosen randomly. Because we do not know the genetic underpinnings of the traits selected against in the DOC lines, we used a simple, additive genetic model with biallelic loci that were passed from parent to offspring. We determined the number of simulated offspring using data collected from the captive populations, specific to each breeding protocol; we randomly sampled the number of offspring produced by each simulated parent pair from the observed distribution of offspring successfully weaned by parent pairs in the captive populations. To generate offspring genotypes, we randomly selected one allele at each locus from each parent. Finally, we assigned male/female with a probability of 0.5 for either sex. A more detailed explanation of our simulation process is available in [16].

At each simulated generation, we calculated allele frequencies. In order to insure that we compared the same allele over all generations, we calculated (at each generation) the allele frequency only for the allele identified as minor in the empirical founder genotypes. After 100 replicate runs, we compared the empirical allele frequency to the distribution of simulated frequencies, and calculated a p-value for each SNP as the proportion of simulated replicate frequencies that were more extreme (i.e. closer to either 0 or 1) than the empirical allele frequency. We adjusted these p-values to account for the false discovery rate using the Benjamini and Hochberg [22] correction (p.adjust; R Develoment Core Team 2014) and identified SNPs with adjusted p-values < 0.05 as those likely impacted by selection because they were statistically inconsistent with neutral expectations (i.e., drift). We refer to such SNPs as "nonneutral". Although we simulated each SNP independently regardless of linkage group assignment, for our analyses we grouped results into putative linkage groups, and report the sum of the number of singletons and putative groups identified as nonneutral (i.e., under selection).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A