Gibbs Sampling

Xiaochun Sun; Rita H. Mumm

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Gibbs Sampling

XS Xiaochun Sun

RM Rita H. Mumm

This method is extracted from research article: BMC Bioinformatics, Feb 2016

Method to represent the distribution of QTL additive and dominance effects associated with quantitative traits in computer simulation

DOI: 10.1186/s12859-016-0906-z

Request a Protocol

Ask a question

Favorite

In Bayesian framework, unknown variables were sampled and updated from the conditional posterior distribution using Markov Chain Monte Carlo (MCMC) [14]. Considering the likelihood and priors in Formulae 3 and 4, the full joint posterior distribution can be written as follows:

Unobservables (c, μ, σ²) were repeatedly sampled and updated from their posteriors, conditional on all other variables. The Gibbs sampler was implemented as follows:

Initialization: Assign initial values for (μ_k, σ_k²) where k = 1 and c_i = 1, for i = 1 : n.

Update θ_i: The conditional posterior distribution of θ_i was

Update cluster indicators c_{i :} The conditional posterior probabilities for c_i were:

$\begin{array}{l} P (c_{i} = K + 1 | else) \propto α \iint N (y_{i}; θ_{i}; σ_{K + 1}^{2}) N (θ_{i}; μ_{K + 1}; τ_{i}^{2}) N (μ_{K + 1}; μ_{0}; σ_{0}^{2}) I G (σ_{K + 1}^{2}; r_{1}; r_{2}) d μ_{K + 1} d σ_{K + 1}^{2} \\ \propto α \int N (y_{i}; θ_{i}; σ_{K + 1}^{2}) I G (σ_{K + 1}^{2}; r_{1}; r_{2}) d σ_{K + 1}^{2} \int N (θ_{i}; μ_{K + 1}; τ_{i}^{2}) N (μ_{K + 1}; μ_{0}; σ_{0}^{2}) d μ_{K + 1} \\ = \frac{α}{2 π} \frac{{r_{2}}^{r_{1}}}{Γ (r_{1})} \frac{Γ (r_{1} + \frac{1}{2})}{{(\frac{1}{2} {(y_{i} - θ_{i})}^{2} + r_{2})}^{r_{1} + \frac{1}{2}}} \sqrt{\frac{1}{(τ_{i}^{2} + σ_{0}^{2})}} exp (- \frac{{(θ_{i} - μ_{0})}^{2}}{2 (τ_{i}^{2} + σ_{0}^{2})}) \end{array}$ where Γ(.) is the gamma function. Note that constant $\frac{1}{n - 1 + α}$ was omitted in both probabilities and (μ_K + 1, σ_K + 1²) were unknown and needed to be integrated out to leave c_i as the only variable to be estimated from the Markov Chain. The Dirichlet Process was represented via the CRP [15]. Effects were assigned to either currently holding cluster(s) or a new cluster based on the above probabilities. If a new cluster was chosen, then the cluster size was increased, i.e. K + 1 → K. In case of n_− i,k = 0, the k^th cluster was eliminated and the cluster indicators were decreased by one, i.e. K → K − 1.

Resample and update (μ_k, σ_k²) suggested by Formulae 2 as per Neal [12] as follows:

where n_k is the number of effects associated with the k^th mixture component. The derivations of the fully conditional posterior distributions are detailed in the Appendix.

Repeat Steps 2 to 4.

Gibbs sampler was implemented with 100,000 iterations of the MCMC to update conditional posterior distributions. The first 80,000 samples were discarded as burn-in and the rest of the 20,000 samples were used to construct joint posterior distribution. The hyper-parameters in Algorithm 5 were set to be α = 0.05, r₁ = 1, r₂ = 0.01, μ₀ = 0, σ₀² = 0.01. Among hyperparameters, alpha was empirically set to 0.05 based on the simulation results. (Note: The larger the magnitude of alpha, the higher the probability of a large number of clusters.) Convergence was checked by inspection of negative log-likelihood plots. After the burn-in period, when the MCMC converges to the stationary distribution, sampled parameters were collected to form the posterior distribution. We employed posterior means for estimating the mean and variance $({\hat{μ}}_{k}, {\hat{σ}}_{k}^{2})$ and posterior modes for estimating ĉ_i, which was further used to infer ${\hat{π}}_{k}$ . The Bayesian confidence interval (BCI), which is the counterpart of the confidence interval in frequentist statistics, was defined as posterior probability that the parameter lies within the interval:

where α is the significance level. Instead of analytically estimating the confidence interval, the confidence interval for $({\hat{μ}}_{k}, {\hat{σ}}_{k}^{2})$ was numerically estimated from quartiles of posterior distribution.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol