We applied several criteria to (remove sequencing errors, reduce technical artifacts, and) obtain only those alleles with patterns driven by selection, following other genomic time series analyses (19, 23, 24) unless there was reason to deviate. It is inherent to pooled sequencing data that low-frequency mutations are difficult to distinguish from sequencing errors (45). To remove any observed variation induced by sequencing error, we set a conservative coverage-based detection limit threshold of 5% (virus) and 25% (algae). For any indel calls passing this threshold, all information 10 bp up- and downstream (including the position itself) was not included for further analysis. Because we started the experiments with isogenic populations, any observed variation at time point 0 (<99% ancestral allele frequency) is likely an artifact, and these loci were removed from the dataset. We only included a locus if derived allele frequency reached the detection threshold at more than one time point. Loci were also removed if the number of missing values across all time points exceeded 1 (virus) or 3 (host). Last, in the host datasets, we observed several sets of mutations at closely neighboring reference positions with highly correlated frequency trajectories. Our setup did not permit us to ascertain if those are multiple independent polymorphisms in the same cohort or a larger structural variant that appears as multiple SNPs in our alignment. Hence, SNPs within 1000 bp of each other and with highly correlated frequency trajectories were collapsed into one. Frequency values were averaged, and during annotation, the most severe phenotypic effect was used as representative for these collapsed sets. Because we expect selection to be nonconstant under the described dynamic eco-evolutionary conditions, we did not filter allele frequency trajectories for lack of temporal autocorrelation. Table S2 shows the number of allele frequency trajectories removed by each filtering step. The final set of SNPs was annotated using snpeff (46).

Given the observed population sizes and length of the experiment, it is highly unlikely for spontaneously arising neutrally evolving loci to reach the detection limit by drift alone. Any allele that reaches a detectable frequency is therefore either under positive selection or linked to something that is. We acknowledge that because we rather stringently filtered for potential sequencing errors, we inevitably also excluded low-frequency genomic variants from the analysis and thus did not exhaustively characterize the genomic variation in the populations. However, this temporal genome-wide SNP dataset does provide an accurate image of the strength and speed of evolution.

Note: The content above has been extracted from a research article, so it may not display correctly.



Q&A
Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.



We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.