Simulated data generation loosely follows the procedure outlined by Weiss et al. [20]. First, a target covariance matrix, representing the underlying correlations of variables (e.g., microbes) in the count tables, is generated with diagonal elements equal to one and off-diagonal elements equal to zero. Using this covariance matrix, -dimensional multivariate normal vectors with mean zero and covariance matrix σ are drawn resulting in an matrix. The cumulative distribution function (CDF) of the standard normal distribution is then used to transform each element of the matrix into quantiles. From here, one of five marginal distributions (log-normal, exponential, gamma, negative binomial, or beta negative binomial) are imparted on each of the vectors by applying the chosen distribution’s inverse cumulative distribution function. The parameters of each distribution were randomly selected from ranges that resulted in each distribution having a comparable mean and standard deviation (Fig. 6B). Finally, random subsets of variable pairs are adjusted to reflect amensal, commensal, or exploitative relationships using the following non-linear heuristic. Given unadjusted vectors and , the pair ( are adjusted (depending on the modeled interaction) by
In the case of amensal relationships, is depressed by (9) and is left unaltered. In the case of commensal relationships, is increased by (9) and is left unaltered. Finally, for exploitative relationships is depressed by (9) and is increased by (8). By modeling pairwise interactions in this fashion, and are adjusted by a factor that: (i) is a function of the other, (ii) depends on the relative magnitudes between the two, and (iii) has non-linear components. We use the variable as a way to control the strength of relationship between and and set = 3 for the analyses performed as the adjustments at this level provided interactions with enough signal to be detected, but not enough to make detection of pairs trivial. It was ensured that each variable could only participate in one pairwise interaction. This was done to ensure that only pairwise relationships were present during analysis. We find that this heuristic provides a non-linear relationship between and without affecting their relative marginal distributions too much and works well for scope of this study. Zero-inflated count data was modeled by subtracting the mean entry of each adjusted count table from itself, then setting any negative values to zero. Prior to any analysis, count tables were subject to either TMM normalization [41], RLE normalization [44], or total sum scaling. Unless stated otherwise, count tables are designed to yield n = 50 samples of d = 1200 variables containing 100 unique examples of each ecological relationship.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.