Bayesian inference of progenitor types

Alfredo Llorca; Oscar Marin

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Preprint

Bayesian inference of progenitor types

AL Alfredo Llorca

OM Oscar Marin

Last updated date: Feb 26, 2020 Views: 1086 Forks: 0

An abbreviated version of this protocol was published in eLife in Nov, 2019

A stochastic framework of neurogenesis underlies the assembly of neocortical cytoarchitecture

Download PDF

Ask a question

How to cite

Favorite

Dear Christine,

In order to estimate the number of populations in the data we developed a Markov chain monte carlo sampler ("samplin", available through the bitbucket repository: https://bitbucket.org/giovannidiana/samplin/src/master/) to generate samples from the posterior distribution of model parameters introduced in the method section for which you requested the protocol. In particular the algorithm uses the Dirichlet process prior perform transitions in the number of possible classes of lineages (see the work of Neal: Neal RM. Markov Chain Sampling Methods for Dirichlet Process Mixture Models. Journal of Computational and Graphical Statistics. 2000;9(2):249–265 discussing the theoretical framework and also Diana, Giovanni, Thomas TJ Sainsbury, and Martin P. Meyer. "Bayesian inference of neuronal assemblies." PLoS computational biology 15.10 (2019) for the application of the same method with different data)

In order to reproduce our analysis you need a C++ compiler (standard in linux or OSX systems)

clone the above repository locally and install the C++ software by running "make" within a terminal.
run the command "./bin/gibbs_data <iterations> <burn in> <trim> <maximum populations> <random seed> <data matrix> <output folder>

<iterations> is the number of MCMC steps
<burn in> is the number of samples excluded from the beginning of the Markov chain
<trim> corresponds to how many samples to discard along the chain before accepting a draw (usually done to reduce correlations among samples)
<maximum populations> is the number of populations for which model parameters are recorded in the output files
<random seed> is used to initialize the markov chain
<data matrix> is the original data file which should be formatted as a matrix where each row contains the layer occupancies of a given lineage.
<output folder> is the folder where output files will be stored. In particular the file "P.dat" contains draws from the posterior distribution of the number of populations.

This method was specifically designed to analyze data from cortical layer occupancy of lineages using a specific model of the data. This might not be suitable for other datasets which might well contain population structures but which are not well described by the specific model we used. The general algorithm implemented in "samplin" can be still used provided that a different model is implemented. For specific informations on how to introduce a different model in the algorithm please contact Giovanni Diana by email at g.diana.mail@gmail.com.

Best regards,

Giovanni Diana

How to cite：

Readers should cite both the Bio-protocol preprint and the original research article where this protocol was used:

Llorca, A and Marin, O(2020). Bayesian inference of progenitor types. Bio-protocol Preprint. bio-protocol.org/prep230.
Llorca, A., Ciceri, G., Beattie, R., Wong, F. K., Diana, G., Serafeimidou-Pouliou, E., Fernández-Otero, M., Streicher, C., Arnold, S. J., Meyer, M., Hippenmeyer, S., Maravall, M. and Marin, O.(2019). A stochastic framework of neurogenesis underlies the assembly of neocortical cytoarchitecture. eLife. DOI: 10.7554/eLife.51381