Approximation of relative abundance of each sequence type (ST) from metagenomic data was performed using the workflow called 'Bayesian Identification of Bacteria (BIB)' (1), and followed recommended workflow from the author's github (https://github.com/PROBIC/BIB).
First, the genome assembly of each ST and metagenomic data are required. As an initial step, core-alignment of all the STs need to be extracted. This process requires progressiveMauve (2).
$ progressiveMauve --output=full_alignment.xmfa ST_X_assembly.fasta ST_Y_assembly.fasta ST_Z_assembly.fasta
$ stripSubsetLCBs full_alignment.xmfa full_alignment.xmfa.bbcols core_alignment.xmfa 500 4
And change xmfa format to fasta, and remove gaps, which are generated during the alignment.
$ perl xmfa2fasta.pl --file core_alignment.xmfa > core_alignment.fasta
$ sed 's/-//g' core_alignment.fasta > core_alignment_gapless.fasta
Once the fasta-formatted core alignment is ready, the metagenome data need to be aligned to the core alignment (core_alignment_gapless.fasta), using Bowtie2 (3).
$ bowtie2-build core_alignment_gapless.fasta core_alignment_gapless
$ bowtie2 -x core_alignment_gapless -U metagenome_reads.fastq -S metagenome_aligned.sam -a
Then, estimate the abundances of different STs using the alignment (metagenome_alignment.sam) and the core alignment (core_alignment_gapless.fasta), using BitSeq (4)
$ parseAlignment metagenome_aligned.sam -o alignment_info.prob --trSeqFile core_alignment_gapless.fasta --trInfoFile genome_info.tr --uniform --verbose
$ estimateVBExpression -o final_abun lignment_info.prob -t genome_info.tr
After successfully finish the pipeline, user should have a file named 'final_abun.m_alphas', which illustrates the abundance of each ST in the metagenome data.
References
- Sankar A, et al. Microb Genom. 2016.
- Darling AE, Mau B, and Perna NT. PLoS ONE. 2010.
- Langmead B, and Salzberg SL. Nat. Methods. 2012.
- Glaus P, Honkela A, and Rattray M. Bioinformatics. 2012.