As noted in the Materials and Methods section of the paper, we did the following:
Rarefaction analysis Rarefaction analysis was performed by randomly selecting subsets (without replacement) of between 1 and 627 (all), 232 (Cluster A) or 108 (Cluster B) mycobacteriophages and determining the numbers of phamilies represented. This was repeated 10,000 times to generate a mean number of phamilies observed given a number of phage genomes selected. The means of the accumulated numbers of phams and the numbers of new phages identified are plotted as the function of the number of genomes selected at random. The observed numbers were fit to a hyperbolic function for 50% of the sample (i.e., 1 to 314, 116 or 54 genomes for all, Cluster A or Cluster B phages, respectively); Hanes-Woolf regression was used to estimate PhamMax and Km of the hyperbola:
NPhams = (PhamMax × NGenomes) / (Km + NGenomes)
where NGenomes is the number of genomes sampled, NPhams is the number of total phams seen within those genomes, PhamMax is the total number of phams among all mycobacteriophage genomes, and Km is the number of genomes required to sample one half of PhamMax. The lack of fit of the observed data to the hyperbola—with the observed data reflecting infinite size—suggests that the overall population is dynamic. The lack of hyperbolic fit of the data does not result from outliers such as phages with highly deviant GC%, because removing these does not improve the fit. The fit is also not substantially improved by analysis of the two largest clusters, Cluster A and Cluster B (Figure 7), suggesting that the dynamic nature of the gene pool is not an artifact of examining independent phage clusters with separate gene pools. To model this behavior, we modified Equation 1 to include the introduction of novel phams via recombination with outside, non-mycobacteriophage genomes:
where CPhage is the number of outside phams seen in each phage. The value of CPhage was estimated from Figure 7B and new values for PhamMax and KPham were estimated by Hanes-Woolf regression following data normalization.
Copyright: Content may be subjected to copyright.
How to cite:
Readers should cite both the Bio-protocol preprint and the original research article where this protocol was used:
Russell, D and Hatfull, G(2021). Rarefaction analysis. Bio-protocol Preprint. bio-protocol.org/prep1086.
Pope, W. H., Bowman, C. A., Russell, D. A., Jacobs-Sera, D., Asai, D. J., Cresawn, S. G., Jacobs, W. R., Hendrix, R. W., Lawrence, J. G. and Hatfull, G. F.(2015). Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity. eLife. DOI: 10.7554/eLife.06416
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this
article to respond.
0/150
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.
Spinning
Post a Question
0 Q&A
Spinning
This protocol preprint was submitted via the "Request
a Protocol" track.