HMP database retrieval and MetaPhlAn2 analysis

JW Jicheng Wang
JZ Jiachao Zhang
WL Wenjun Liu
HZ Heping Zhang
ZS Zhihong Sun
request Request a Protocol
ask Ask a question
Favorite

We chose HMP database (http://hmpdacc.org/) as a reference to do the structural and functional analysis. First, we downloaded the complete genome sequences and annotation of the human gut microbiome, which contains 358 publicly available human microbiome genomes generated from the National Institutes of Health (NIH) Human Microbiome Project and the European MetaHIT consortium. Besides, we added the LcZ genome (http://www.ncbi.nlm.nih.gov/) to the database to evaluate the influence of LcZ on the microbiome. We then aligned our metagenomic and metatranscriptomic data to the genomes with Bowtie265, allowing no more than one mismatch. The MEGAHIT was used to generate assembled contigs66. After that, we calculated the reads number and RPKM (Reads Per Kilobase per Million mapped reads) value for each contig and gene in the database. We then obtained the abundance of different taxonomic levels from species to kingdom by adding relative contigs abundance together. To consistently estimate the functional composition of the samples, we annotated the genes from the HMP database using COG orthologous groups and KEGG pathways by blastx program with e-value 1e−5. We ensured that comparative analysis using these procedures was not biased by data-set origin, sample preparation, sequencing technology and quality filtering.

For meta-transcriptomic gene abundance, to study gene expression alteration changed by ingestion of LcZ, we compared the expression change between day 14 and day 0, day 28 and day 0 and day 28 and day 14. First, we got differentially expressed species (Wilcoxon rank-sum test) and extracted all genes abundance from these species, and then obtained the differentially expressed genes. We then used WGCNA67 method to classify the differentially expressed genes as modules based on their expression pattern. After classification, we used the annotation of KEGG to obtain the functional enrichment pathways by hypergeometric test.

For both meta-genomic and meta-transcriptomic reads, we have applied the MetaPhlAn2 and GraPhlAn software (54) to obtain the relative abundance of each species. Top abundant species of all samples were used to make a dendrogram heatmap via hierarchical clustering. After the calculation of species abundance, we got differentially expressed species to analyze the influence of LcZ on transcription variation.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A