The sequences obtained were curated on a proprietary analysis pipeline (www.mrdnalab.com, MR DNA, Shallowater, TX). Sequences were depleted of barcodes and primers, followed by the removal of short sequences (<200 bp), sequences with ambiguous base calls, and sequences with homopolymer runs exceeding 6 bp. Sequences were then de-noised and chimeras removed. Operational taxonomic units were defined after removal of singleton sequences showing clustering at 3% divergence, or 97% similarity (Dowd et al., 2008a, 2008b, 2011; Edgar, 2010; Eren et al., 2011; Swanson et al., 2011). The remaining sequences were analyzed using BLASTn against GreenGenes databases (DeSantis et al., 2006). The obtained similarities were assigned to taxonomic classification for bacteria. Taxonomic levels were assigned based on identity of sequences to reference databases: >97% identity to define species, between 95-97% for genus, 90-95% for family, 80-85% for order, 77-80% for phyla and <77% considered as unclassified. All the raw reads were submitted to GenBank under Bio-Project PRJNA288043.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.