In order to infer the microbial composition at the species level, we sequenced the 16S rRNA V4 region for all 131 samples using the Illumina MiSeq platform. Our DNA extraction protocol included steps involving bead beating and heating in order to expand the range of microbial species which could be observed [109]. Our 16S rRNA sequencing targeted amplification of a ~ 250 bp region of the microbial 16S rRNA gene with tailed primers (515F & 806R) to generate 175 bp paired-end reads [110]. The mean read depth per sample was 152,300 paired reads. Reads are publicly available on NCBI BioProject PRJNA528960.

16S rRNA sequencing reads from 131 samples were analyzed using UPARSE [38] with version 14 of the RDP database [111], using a 97% identity threshold. We used UPARSE to perform quality filtering of reads, discarding singleton reads and then clustering the remaining reads into operational taxonomic units (OTUs). Representative sequences for OTUs in FASTA format can be found in Additional file 5. UPARSE generated an OTU matrix containing 1073 OTUs from 131 samples (Additional file 6). The 7 samples that were collected at time of UTI were excluded from downstream analysis, leaving 1070 OTUs from 124 samples. Rarefaction analysis was performed with the q2-diversity plugin in QIIME2 [52]. Read depths from 1000 to 20,000 reads per sample were tested, and average Chao1 vs. the number of samples retained was plotted. A value of 5000 reads per sample was chosen for downstream analysis, as it resulted in high Chao1 values (~ 75% of highest depth tested) while retaining most samples (~ 80% of samples). Then, to normalize for uneven sequencing depth across samples, we rarefied the OTU read counts of each sample to 5000 reads, resulting in an OTU matrix with 1023 OTUs from 97 samples. We additionally filtered out OTUs with fewer than 10 reads total in order to remove rare and ultra-low abundant OTUs, which we would not have enough power to detect differences in [112], resulting in 943 final OTUs across 97 samples for analysis (Additional file 7). Taxonomic assignment of each OTU was performed using the UTAX algorithm, a k-mer based method that looks for common k-mers between query and reference sequences with known taxonomy and assigns confidence estimates based on data training, using the RDP v16 database training set provided by the UPARSE author. To assign species level to OTUs of interest (Fig. S3), representative sequences were queried against the NCBI 16S ribosomal RNA database using blastn (megablast, default parameters).

In order to test for OTU-level compositional differences before vs. after treatment across both study arms, we used a generalized Hotelling’s test (GHT) implemented in R [40]. In brief, the GHT tests for whether the average microbiome compositions of paired samples (i.e., before and after the study) are the same or different. We ran the GHT for each study arm independently.

The OTU matrix was used to calculate the α-diversity (species richness) using the Chao1 index [41] and the Shannon diversity index [42], as well as the β-diversity (species dynamics) using the Bray-Curtis dissimilarity [113]. All of these metrics were implemented in v1.8 of QIIME [114].

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.