Data simulation

Dallace Francis; Fengzhu Sun

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Data simulation

DF Dallace Francis

FS Fengzhu Sun

This method is extracted from research article: BMC Bioinformatics, Aug 2024

A comparative analysis of mutual information methods for pairwise relationship detection in metagenomic data

DOI: 10.1186/s12859-024-05883-7

Request a Protocol

Ask a question

Favorite

Simulated data generation loosely follows the procedure outlined by Weiss et al. [20]. First, a $d \times d$ target covariance matrix $σ$ , representing the underlying correlations of variables (e.g., microbes) in the count tables, is generated with diagonal elements equal to one and off-diagonal elements equal to zero. Using this covariance matrix, $n$ $d$ -dimensional multivariate normal vectors with mean zero and covariance matrix σ are drawn resulting in an $n \times d$ matrix. The cumulative distribution function (CDF) of the standard normal distribution is then used to transform each element of the matrix into quantiles. From here, one of five marginal distributions (log-normal, exponential, gamma, negative binomial, or beta negative binomial) are imparted on each of the $d$ vectors by applying the chosen distribution’s inverse cumulative distribution function. The parameters of each distribution were randomly selected from ranges that resulted in each distribution having a comparable mean and standard deviation (Fig. 6B). Finally, random subsets of variable pairs are adjusted to reflect amensal, commensal, or exploitative relationships using the following non-linear heuristic. Given unadjusted vectors $X = (x_{1}, x_{2}, \dots, x_{n})$ and $Y = (y_{1}, y_{2}, \dots, y_{n})$ , the pair ( $x_{i}, y_{i})$ are adjusted (depending on the modeled interaction) by

In the case of amensal relationships, $y_{i}$ is depressed by (9) and $x$ is left unaltered. In the case of commensal relationships, $y_{i}$ is increased by (9) and $x$ is left unaltered. Finally, for exploitative relationships $y_{i}$ is depressed by (9) and $x_{i}$ is increased by (8). By modeling pairwise interactions in this fashion, $x_{i}$ and $y_{i}$ are adjusted by a factor that: (i) is a function of the other, (ii) depends on the relative magnitudes between the two, and (iii) has non-linear components. We use the variable $s$ as a way to control the strength of relationship between $X$ and $Y$ and set $s$ = 3 for the analyses performed as the adjustments at this level provided interactions with enough signal to be detected, but not enough to make detection of pairs trivial. It was ensured that each variable could only participate in one pairwise interaction. This was done to ensure that only pairwise relationships were present during analysis. We find that this heuristic provides a non-linear relationship between $X$ and $Y$ without affecting their relative marginal distributions too much and works well for scope of this study. Zero-inflated count data was modeled by subtracting the mean entry of each adjusted count table from itself, then setting any negative values to zero. Prior to any analysis, count tables were subject to either TMM normalization [41], RLE normalization [44], or total sum scaling. Unless stated otherwise, count tables are designed to yield n = 50 samples of d = 1200 variables containing 100 unique examples of each ecological relationship.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol