Fastq data from the 10 repeated replicates of each sample were merged, randomly re-sampled according to the original data sizes (total number of reads), and analyzed by the CLARK-based pipeline. The CVs were calculated based on the read numbers mapped to each microbe within each re-sampled replicate. This above process was repeated 10 times to obtain a total of 10 simulated CVs for each microbe. The average of these simulated CVs represented the CV derived from variations in data size. A linear regression model was used to evaluate the contribution of these CVs to the observed overall CVs. In addition, we used a linear mixed model to further evaluate whether the sequencing platform, library method, and class of microorganism affected the observed CV. The formula of the linear mixed model was defined as:
Cv_observed ∼ Cv_datasize + Library Prep + Microbial Class + Platform + (1+Cv_observed|Center) where Center was a random effect, and the read depth CV (Cv_datasize), library preparation method (Library Prep), microbial class, and sequencing platform (Platform) were fixed effects.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.