Clustering of sequences into OTUs.

GH Geoffrey L. House
SE Saliya Ekanayake
YR Yang Ruan
US Ursel M. E. Schütte
WK Wittaya Kaonongbua
GF Geoffrey Fox
YY Yuzhen Ye
JB James D. Bever
request Request a Protocol
ask Ask a question
Favorite

The five different clustering methods that we tested were selected to represent a range of clustering approaches based on three distinct types of underlying clustering algorithms: (i) greedy (or top-down), with AbundantOTU v.0.93b (39), CD-HIT-OTU v.0.0.2 (38), and UPARSE v.8.1.1 (42); (ii) hierarchical (or bottom-up), with mothur v.1.34.0 (41); and (iii) Bayesian, with CROP v.1.33 (40). We used a 97% sequence similarity as the required threshold for the greedy and hierarchical algorithms and ran them with default settings, except that we allowed CD-HIT-OTU to find the consensus PCR primer sequence as the first 21 bp of each sequence. For CROP, similarity levels approximated 97% (see the supplemental material for clustering commands). For each clustering method, we calculated the number of OTUs represented per isolate (OTU richness) and the Shannon diversity index of sequence distribution among OTUs for each isolate (OTU diversity). To correct for uneven numbers of sequences between isolates, we rarefied the OTU richness value for each isolate to the minimum number of sequences per isolate (Diversispora spurca, with 85 sequences for AbundantOTU, CROP, mothur, and UPARSE and 93 sequences for CD-HIT-OTU) by using the vegan package in R (v.3.1.2; R Core Team, Vienna, Austria).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A