In order to estimate the number of populations in the data we developed a Markov chain monte carlo sampler ("samplin", available through the bitbucket repository: https://bitbucket.org/giovannidiana/samplin/src/master/) to generate samples from the posterior distribution of model parameters introduced in the method section for which you requested the protocol. In particular the algorithm uses the Dirichlet process prior perform transitions in the number of possible classes of lineages (see the work of Neal: Neal RM. Markov Chain Sampling Methods for Dirichlet Process Mixture
Models. Journal of Computational and Graphical Statistics.
2000;9(2):249–265 discussing the theoretical framework and also Diana, Giovanni, Thomas TJ Sainsbury, and Martin P. Meyer. "Bayesian inference of neuronal assemblies." PLoS computational biology 15.10 (2019) for the application of the same method with different data)
In order to reproduce our analysis you need a C++ compiler (standard in linux or OSX systems)
clone the above repository locally and install the C++ software by running "make" within a terminal.
run the command "./bin/gibbs_data <iterations> <burn in> <trim> <maximum populations> <random seed> <data matrix> <output folder>
<iterations> is the number of MCMC steps
<burn in> is the number of samples excluded from the beginning of the Markov chain
<trim> corresponds to how many samples to discard along the chain before accepting a draw (usually done to reduce correlations among samples)
<maximum populations> is the number of populations for which model parameters are recorded in the output files
<random seed> is used to initialize the markov chain
<data matrix> is the original data file which should be formatted as a matrix where each row contains the layer occupancies of a given lineage.
<output folder> is the folder where output files will be stored. In particular the file "P.dat" contains draws from the posterior distribution of the number of populations.
This method was specifically designed to analyze data from cortical layer occupancy of lineages using a specific model of the data. This might not be suitable for other datasets which might well contain population structures but which are not well described by the specific model we used. The general algorithm implemented in "samplin" can be still used provided that a different model is implemented. For specific informations on how to introduce a different model in the algorithm please contact Giovanni Diana by email at g.diana.mail@gmail.com.
Best regards,
Giovanni Diana
Copyright: Content may be subjected to copyright.
How to cite:
Readers should cite both the Bio-protocol preprint and the original research article where this protocol was used:
Llorca, A and Marin, O(2020). Bayesian inference of progenitor types. Bio-protocol Preprint. bio-protocol.org/prep230.
Llorca, A., Ciceri, G., Beattie, R., Wong, F. K., Diana, G., Serafeimidou-Pouliou, E., Fernández-Otero, M., Streicher, C., Arnold, S. J., Meyer, M., Hippenmeyer, S., Maravall, M. and Marin, O.(2019). A stochastic framework of neurogenesis underlies the assembly of neocortical cytoarchitecture. eLife. DOI: 10.7554/eLife.51381
Post your question to gather feedback from the community. We will also invite the authors of this
article to respond.
0/150
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.
Spinning
Post a Question
0 Q&A
Spinning
This protocol preprint was submitted via the "Request
a Protocol" track.