2.2 ShareNet’s Bayesian model for information sharing

ShareNet minimally requires that the chosen algorithm outputs a continuous score for each putative edge of interest in a network for each cell type. A user would first apply his/her network inference algorithm of choice to each cell type in a dataset to generate an initial estimate of the edge scores for each cell type. ShareNet models these initial network estimates as noisy observations of true underlying edge scores, governed by a hierarchical Bayesian model that induces similarities in predictions between similar cell types.

In this Bayesian model, the collection of latent cell type-to-cell type information sharing patterns is represented as a mixture of multivariate Gaussian distributions with K mixture components.

Here, zi,j is a C-dimensional vector, where C is the number of cell types in a dataset. Each of the C-by-C covariance matrices Σ1,,ΣK represents a unique cell type-to-cell type sharing pattern, with the off-diagonal entries in each covariance matrix capturing potential positive or negative correlations in the predicted edge scores between cell types.

For regularization, we place a Normal-Wishart (NW) prior over each of the mean μk and the inverse covariance Σk1 parameters belonging to the K sharing patterns.

Each element of the C-dimensional vector zi,j=(zi,j(1),,zi,j(C)) then serves as a parameter for a univariate Gaussian distribution that describes the noisy distribution of an edge’s score from the chosen network inference algorithm.

Here, ei,j(c) represents the observed score for the edge connecting gene i to gene j in cell type c. In this setup, the mean zi,j(c) of the Gaussian is a latent variable that represents the true score of the edge in an ideal scenario absent of noise. Note that the noise distribution for each edge score is assumed to be Gaussian, a condition that is approximately satisfied by a range of methods we consider in this work (Supplementary Fig.S1). Optionally, one can model the mean of each edge distribution as g(zi,j(c)), where g(·) describes a link function that provides further flexibility for modeling the sharing patterns of edge scores; all of our results are based on the default configuration without this added non-linearity.

The variance parameter σi,j(c)2 captures the degree of variation in the observed edge score. Importantly, the variation captured by σi,j(c)2 represents the aggregate set of biological, technical and sample size factors that may contribute to noisy estimates of the edge score. A more detailed discussion of this variance term is presented in the next section.

Note: The content above has been extracted from a research article, so it may not display correctly.



Q&A
Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.



We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.