request Request a Protocol
ask Ask a question
Favorite

Simulated data generation loosely follows the procedure outlined by Weiss et al. [20]. First, a d×d target covariance matrixσ, representing the underlying correlations of variables (e.g., microbes) in the count tables, is generated with diagonal elements equal to one and off-diagonal elements equal to zero. Using this covariance matrix, n d-dimensional multivariate normal vectors with mean zero and covariance matrix σ are drawn resulting in an n×d matrix. The cumulative distribution function (CDF) of the standard normal distribution is then used to transform each element of the matrix into quantiles. From here, one of five marginal distributions (log-normal, exponential, gamma, negative binomial, or beta negative binomial) are imparted on each of the d vectors by applying the chosen distribution’s inverse cumulative distribution function. The parameters of each distribution were randomly selected from ranges that resulted in each distribution having a comparable mean and standard deviation (Fig. 6B). Finally, random subsets of variable pairs are adjusted to reflect amensal, commensal, or exploitative relationships using the following non-linear heuristic. Given unadjusted vectors X=x1,x2,,xn and Y=y1,y2,,yn, the pair (xi,yi) are adjusted (depending on the modeled interaction) by

In the case of amensal relationships, yi is depressed by (9) and x is left unaltered. In the case of commensal relationships, yi is increased by (9) and x is left unaltered. Finally, for exploitative relationships yi is depressed by (9) and xi is increased by (8). By modeling pairwise interactions in this fashion, xi and yi are adjusted by a factor that: (i) is a function of the other, (ii) depends on the relative magnitudes between the two, and (iii) has non-linear components. We use the variable s as a way to control the strength of relationship between X and Y and set s = 3 for the analyses performed as the adjustments at this level provided interactions with enough signal to be detected, but not enough to make detection of pairs trivial. It was ensured that each variable could only participate in one pairwise interaction. This was done to ensure that only pairwise relationships were present during analysis. We find that this heuristic provides a non-linear relationship between X and Y without affecting their relative marginal distributions too much and works well for scope of this study. Zero-inflated count data was modeled by subtracting the mean entry of each adjusted count table from itself, then setting any negative values to zero. Prior to any analysis, count tables were subject to either TMM normalization [41], RLE normalization [44], or total sum scaling. Unless stated otherwise, count tables are designed to yield n = 50 samples of d = 1200 variables containing 100 unique examples of each ecological relationship.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A