The full-length 16S rRNA genes (16S sequencing primers: 27F-AGAGTTTGATCMTGGCTCAG; 1492R-CGGTTACCTTGTTACGACTT) were amplified and samples with amplicons above 1nM were subsequently sequenced on GridION (ONT, Oxford, UK) using the R9 flow cell (FLO-MIN106; ONT, Oxford, UK), as per manufacturer’s protocol. Raw sequence reads were basecalled using ONT’s MinKNOW software Guppy v.3.2.4 with the R9.4 high accuracy model. Raw reads were demultiplexed using qcat version 1.0.1. Sequence summary and read length histograms were generated using NanoPlot version 1.30.1. 16S rRNA databases were obtained from NCBI 16S RefSeq (RefSeq, 2020) (Nucleotide search details: 33,175 (BioProject) or 33,317 (BioProject)); the RDP (Cole et al., 2014) release 11, update 5; and the SILVA rRNA database project (Cole et al., 2014) version 132 repositories respectively. Differences in the characteristics of the three databases are summarised in Table S2. Reference sequences from the 16S rRNA databases were then clustered into 97%, 99% and 100% similarity thresholds using heuristic clustering method (greedy incremental clustering algorithm) on CD-HIT (Fu et al., 2012) version 4.8.1 (commands: -c 0.97, 0.99, 1.0, -M 62900, -d 250). We used Minimap2 (Li, 2018) version 2.12-r87 (commands: -K 100M, -ax map-ont) to align the demultiplexed reads to the respective 16S databases and processed the resulting files using the Sequence Alignment/Map (SAM)tools (Li et al., 2009) version 1.9 (commands: samtools view -b -F 2308 (to remove unmapped, non-primary and supplementary reads), samtools sort, samtools index, samtools idxstats). Bioinformatics scripts and the FASTQ files are available at DOI 10.6084/m9.figshare.13213898.v1.

Note: The content above has been extracted from a research article, so it may not display correctly.

