We carried out an unsupervised clustering analysis to investigate microbial diversity across the samples from different stations and layers. To achieve this, we employed the unweighted pair group method with arithmetic mean (UPGMA), a hierarchical clustering algorithm that uses the pairwise distance matrix to cluster samples based on their normalized SWI values. This method calculates the Euclidean distance between any pair of samples, resulting in groups of samples hierarchically similar based on their diversity profiles. All 76 samples present in our original dataset were used for clustering analysis. The dissimilarity matrix produced can be found in the Supplementary Table S2.
The optimal number of clusters was defined by the Mojena method [34]. In this approach, the height of the dendrogram fusion points is used to determine the estimator, calculated by:
where and are the mean and the standard deviation for the height of the dendrogram fusion points, respectively; and k is a constant. The number of groups is determined when , considering the values for fusion points distances, with and n the sample size. We used to obtain the optimal number of clusters [35].
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.