Microbiome community compositions are given by a vector x = [ x1, …, xD ] of D strictly non-negative elements representing the abundances of each part (species, genes, …) in the community, subject to a total sum constraint.
Following standard concepts and definitions [47, 48], the central tendency (center or compositional mean) of a compositional data set X = [ x1, …, xn ], where xj = [ x1,j, …, xD,j ] represents one of n individual compositions, was calculated as the closed geometric mean:
where gi is the geometric mean of the abundance of part i across the n compositions and clo represents the closure operation:
where κ is the closure constant, usually set to 1 or 100%.
Dispersion of a compositional data set X is known as the metric (or total) variance, denoted as mvar(X), and can be calculated based on the variation matrix, denoted as varmat(X), of all possible logratio variances:
For calculation, we used the functions variation and mvar in the R package compositions v2.0 [49] to obtain variation matrices and metric variances, respectively. Based on the variation matrix, we also calculated the contribution of each logratio variance to the metric variance.
The distance between two compositions x = [ x1, …, xD ] and y = [ y1, …, yD ] is known as the Aitchison distance (dA), calculated as:
This is equivalent to the Euclidean distance after centered log ratio (clr) transformation:
where g(x) is the geometric mean of the abundances across parts of x. Accordingly, compositional principal component analysis (PCA) was performed using R’s stats prcomp function, after clr transformation of the abundance data.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.