2.4. Clustering Analysis

BS Bianca C. F. Santiago
IS Iara D. de Souza
JC João Vitor F. Cavalcante
DM Diego A. A. Morais
MS Mikaelly B. da Silva
MP Matheus Augusto de B. Pasquali
RD Rodrigo J. S. Dalmolin
request Request a Protocol
ask Ask a question
Favorite

We carried out an unsupervised clustering analysis to investigate microbial diversity across the samples from different stations and layers. To achieve this, we employed the unweighted pair group method with arithmetic mean (UPGMA), a hierarchical clustering algorithm that uses the pairwise distance matrix to cluster samples based on their normalized SWI values. This method calculates the Euclidean distance between any pair of samples, resulting in groups of samples hierarchically similar based on their diversity profiles. All 76 samples present in our original dataset were used for clustering analysis. The dissimilarity matrix produced can be found in the Supplementary Table S2.

The optimal number of clusters was defined by the Mojena method [34]. In this approach, the height of the dendrogram fusion points is used to determine the θk estimator, calculated by:

where α¯ and σα^ are the mean and the standard deviation for the height of the dendrogram fusion points, respectively; and k is a constant. The number of groups is determined when αj>θk, considering αj the values for fusion points distances, with j=1,...,n and n the sample size. We used k=1.25 to obtain the optimal number of clusters [35].

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A