Dear Zhen-Hao Luo,
thanks for your question.
If you use the maximum likelihood (ML) version of ALE (ALEml, ALEml_undated), then the parameters will be optimised by ML during the analysis, integrating over the different ways in which the sample of gene trees can be reconciled with the species tree (this is the version we used in the paper).
There is also an implementation that samples the parameter values by MCMC (ALEmcmc, ALEmcmc_undated), although these versions of the software are not actively developed at the moment.
Instructions on running each step of the analysis can be found in the ALE documentation on Github (https://github.com/ssolo/ALE). Briefly, if you have a file myGeneFamily.ufboot (containing a bootstrap sample of gene trees generated with iqtree -wbtl, or an MCMC sample of trees from your favourite Bayesian phylogenetics software), you would first run
ALEobserve myGeneFamily.ufboot
resulting in myGeneFamily.ufboot.ale. Then you would run (for the undated model):
ALEml_undated myRootedSpeciesTree.tre myGeneFamily.ufboot.ale fraction_missing=myFractionMissing.txt
To fit the model by ML. The output file myRootedSpeciesTree.tre_myGeneFamily.ufboot.ale.uml_rec will contain (among other output) ML estimates of the D, T and L rates and a sample of reconciled gene trees.
There are several command line options that may be of interest. In our paper, we used the fraction_missing option to specify a file containing estimates of the missing fraction of each genome; such estimates can be obtained using e.g. CheckM or BUSCO (note: the fraction missing is 1-completeness); this helps to correct the rates for the missing data.
Best,
Gergely Szollosi and Tom Williams
---------------------------------------------
Dr. Gergely J Szöllősi
MTA-ELTE ,,Lendület”
Evolutionary Genomics Research Group
ERC “GENECLOCKS” Research Group
head researcher
http://ssolo.web.elte.hu
Tel: 00 36 30 725 35 32
---------------------------------------------