    # Also in the Article

Genetic diversity in an expanding population
This protocol is extracted from research article:
The feedback between selection and demography shapes genomic diversity during coevolution

Procedure

In an exponentially growing population, mutations can rapidly increase in frequency. If all individuals (cells) have comparable growth rates (i.e., when apparent fitness differences are absent), the expected distribution of VAFs generated by the underlying stochastic process of de novo mutation follows a Landau distribution (31), first detected in the famous Luria-Delbrück experiment (37). We showed before (30) that the VAF distribution of a neutrally exponentially growing population should follow a power law given by a cumulative distribution$M(f)=μβ(1f−1fmax)$

Here, f denotes the frequency of a variant within the population, μ is the mutation rate, and β is an effective rate of surviving offspring. We then defined the universal neutrality curve $M¯(f)$ given an appropriate normalization (32). The curve is defined as$M¯(f)=μβ(1f−1fmax)max(M(f))$

If one plots this against $1f$, then the universal neutrality curve becomes a linear line in the interval [0,1] that is independent of the mutation and offspring survival rates. In other words, if variants were generated by random de novo mutations during an expansion phase without fitness differences between individuals, genetic diversity collapses onto this line. We plotted all variants in the frequency interval between 5 and 40% and used the area under the curve (AUC) and the Kolmogorov distance from the line y = x to assess goodness of fit of a linear regression. Any time point with a Kolmogorov distance lower than 0.25 and with an AUC below 3 was judged as a good fit. Good fits to the linear regression indicate agreement between the theoretical scenario of an effectively neutral exponentially growing population and the empirically observed VAF distribution.

Considering a selective sweep, when a single highly adaptive cell grows exponentially, then to reach a frequency of 10% by hitchhiking alone, any neutral mutation needs to occur during the first 10 cell divisions this cell undergoes. This means that with a genome size of ~42 Mb, observing power law dependencies and rapid neutral allele frequency increases of 10% during sweeps are feasible if the Chlorella mutation probability is roughly 1/(42 × 106 × 10) = 2.4 × 10−9 substitutions/position/cell division. While there are no good estimates of the mutation rate in Chlorella, such a rate is plausible. As the virus genome is orders of magnitude smaller (~330 kb), observing comparable power law dependencies would require a mutation rate of 1/(330 × 103 × 10) = 3.0 × 10−7 substitutions/position/cell division. Although such rates have been observed for DNA viruses, no increases in genetic diversity after sweeps were observed. We attribute this to the fact that small virus genomes are densely packed with protein-coding regions; hence, neutral positions are rare. Because of the low number of variants, we did not attempt to do statistics on the viral VAF distribution.

We performed a logistic regression of time points where the VAF distribution matches the expectation under neutral expansion (largely, these are the same time points identified as sweeps due to reduced genetic diversity) on host population growth (generalized linear model with random effect, family = binomial) using the R package lme4 (47). We only included days when virus was present and averaged growth over the 3 days leading up to the time point of genetic sampling. Replicate was included as random effect to account for within-replicate dependency of data points.

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A