Paired‐end Illumina MiSeq reads corresponding to before and after heating of the 42 sediment samples were prepared by trimming and filtering demultiplexed fastq files. ASVs defined as amplified DNA sequences that are identical to each other (no mismatches) were then obtained from paired‐end raw reads using the Divisive Amplicon Denoising Algorithm 2 (DADA2) open‐source web‐based bioinformatics pipeline (Callahan et al., 2016). ASV data were processed and analysed using R package version 4.2.1 (RStudio Team, 2020). A Shapiro–Wilk test showed ASV sequences and endospore abundances were not normally distributed over the data sets, and all subsequent analysis was performed using non‐parametric statistical analyses, including Phyloseq (McMurdie & Holmes, 2013), ggplot2 (Wickham, 2016) and vegan (Oksanen et al., 2014).
Due to the compositional nature of high‐throughput sequencing data, the absolute abundance of DNA molecules originally existing in the environment cannot be determined using nucleic acid sequencing (Gloor et al., 2017). Numbers of sequence reads therefore represent the proportion of a given ASV within a sample. Differences in total counts observed or sample read depth can influence the proportion of ASVs per sample and lead to spurious associations. Errors introduced when using read depth can be mitigated by applying ‘rarefaction’ or read count subsampling to a common read depth (Lozupone et al., 2011; Wong et al., 2016). Disadvantages of rarefaction data set subsampling include substantial loss of information (McMurdie & Holmes, 2014) and removal of rare taxa. Rarefaction of the dataset obtained here would result in >60% of sequences being disregarded. Therefore, to identify novel, often rare, Firmicutes sequences in the heated sediment incubations, ASV counts were determined from the raw read data and plotted as centred log ratios (CLR; Gloor et al., 2017), which is a useful alternative to rarefaction (Aitchison, 1982). A caveat of CLR is that during log transformation information on the precision of the data is lost; however, the ratio remains the same irrespective of whether the data came from a large or small number of reads in a given sample library (Gloor et al., 2017). Data set read count zero values were replaced with the calculated median of the sample raw reads, which operates as a pseudo‐count prior to log transformation. Thus, instead of assigning an arbitrary value of one or zero counts, the pseudo‐count method is modelled to the variability in each sample (Kaul et al., 2017).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.