Calculation of compositional means, metric variances, and Aitchison distances

DT Dieter M. Tourlousse
KN Koji Narita
TM Takamasa Miura
MS Mitsuo Sakamoto
AO Akiko Ohashi
KS Keita Shiina
MM Masami Matsuda
DM Daisuke Miura
MS Mamiko Shimamura
YO Yoshifumi Ohyama
AY Atsushi Yamazoe
YU Yoshihito Uchino
KK Keishi Kameyama
SA Shingo Arioka
JK Jiro Kataoka
TH Takayoshi Hisada
KF Kazuyuki Fujii
ST Shunsuke Takahashi
MK Miho Kuroiwa
MR Masatomo Rokushima
MN Mitsue Nishiyama
YT Yoshiki Tanaka
TF Takuya Fuchikami
HA Hitomi Aoki
SK Satoshi Kira
RK Ryo Koyanagi
TN Takeshi Naito
MN Morie Nishiwaki
HK Hirotaka Kumagai
MK Mikiko Konda
KK Ken Kasahara
MO Moriya Ohkuma
HK Hiroko Kawasaki
YS Yuji Sekiguchi
JT Jun Terauchi
request Request a Protocol
ask Ask a question
Favorite

Microbiome community compositions are given by a vector x = [ x1, …, xD ] of D strictly non-negative elements representing the abundances of each part (species, genes, …) in the community, subject to a total sum constraint.

Following standard concepts and definitions [47, 48], the central tendency (center or compositional mean) of a compositional data set X = [ x1, …, xn ], where xj = [ x1,j, …, xD,j ] represents one of n individual compositions, was calculated as the closed geometric mean:

where gi is the geometric mean of the abundance of part i across the n compositions and clo represents the closure operation:

where κ is the closure constant, usually set to 1 or 100%.

Dispersion of a compositional data set X is known as the metric (or total) variance, denoted as mvar(X), and can be calculated based on the variation matrix, denoted as varmat(X), of all possible logratio variances:

For calculation, we used the functions variation and mvar in the R package compositions v2.0 [49] to obtain variation matrices and metric variances, respectively. Based on the variation matrix, we also calculated the contribution of each logratio variance to the metric variance.

The distance between two compositions x = [ x1, …, xD ] and y = [ y1, …, yD ] is known as the Aitchison distance (dA), calculated as:

This is equivalent to the Euclidean distance after centered log ratio (clr) transformation:

where g(x) is the geometric mean of the abundances across parts of x. Accordingly, compositional principal component analysis (PCA) was performed using R’s stats prcomp function, after clr transformation of the abundance data.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A