2.2 ShareNet’s Bayesian model for information sharing

This protocol is extracted from research article:

Bayesian information sharing enhances detection of regulatory associations in rare cell types

**
Bioinformatics**,
Jul 12, 2021;
DOI:
10.1093/bioinformatics/btab269

Bayesian information sharing enhances detection of regulatory associations in rare cell types

Procedure

ShareNet minimally requires that the chosen algorithm outputs a continuous score for each putative edge of interest in a network for each cell type. A user would first apply his/her network inference algorithm of choice to each cell type in a dataset to generate an initial estimate of the edge scores for each cell type. ShareNet models these initial network estimates as noisy observations of true underlying edge scores, governed by a hierarchical Bayesian model that induces similarities in predictions between similar cell types.

In this Bayesian model, the collection of latent cell type-to-cell type information sharing patterns is represented as a mixture of multivariate Gaussian distributions with *K* mixture components.

Here, ${\mathit{z}}_{i,j}$ is a *C*-dimensional vector, where *C* is the number of cell types in a dataset. Each of the *C*-by-*C* covariance matrices ${\mathbf{\Sigma}}_{1},\dots ,{\mathbf{\Sigma}}_{K}$ represents a unique cell type-to-cell type sharing pattern, with the off-diagonal entries in each covariance matrix capturing potential positive or negative correlations in the predicted edge scores between cell types.

For regularization, we place a Normal-Wishart (NW) prior over each of the mean ${\mu}_{k}$ and the inverse covariance ${\mathbf{\Sigma}}_{k}^{-1}$ parameters belonging to the *K* sharing patterns.

Each element of the *C*-dimensional vector ${\mathit{z}}_{i,j}=({z}_{i,j}^{(1)},\dots ,{z}_{i,j}^{(C)})$ then serves as a parameter for a univariate Gaussian distribution that describes the noisy distribution of an edge’s score from the chosen network inference algorithm.

Here, ${e}_{i,j}^{(c)}$ represents the observed score for the edge connecting gene *i* to gene *j* in cell type *c*. In this setup, the mean ${z}_{i,j}^{(c)}$ of the Gaussian is a latent variable that represents the true score of the edge in an ideal scenario absent of noise. Note that the noise distribution for each edge score is assumed to be Gaussian, a condition that is approximately satisfied by a range of methods we consider in this work (Supplementary Fig.S1). Optionally, one can model the mean of each edge distribution as $g({z}_{i,j}^{(c)})$, where $g(\xb7)$ describes a link function that provides further flexibility for modeling the sharing patterns of edge scores; all of our results are based on the default configuration without this added non-linearity.

The variance parameter ${\sigma}_{i,j}^{(c)2}$ captures the degree of variation in the observed edge score. Importantly, the variation captured by ${\sigma}_{i,j}^{(c)2}$ represents the aggregate set of biological, technical and sample size factors that may contribute to noisy estimates of the edge score. A more detailed discussion of this variance term is presented in the next section.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A

Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.