DNA extracted from fecal samples was used for mNGS. Metagenomic libraries were constructed with a TruSeqTM DNA Sample Preparation kit (Illumina, San Diego, CA, United States) and sequenced at Shanghai Personal Biotechnology (Shanghai, China) on an Illumina HiSeq system with a 150-bp paired-end protocol. Reads from each sample were retained and then merged by megahit (Li et al., 2015) and, if not, matched to human genome sequences (hg19) using Bowtie 2 (Langmead and Salzberg, 2012). The remaining results of merging were contigs of length ≥500 bp. Gene prediction was carried out based on MetaGeneMark (Noguchi et al., 2006) and then combined into a gene set. The protein sequences of genes were clustered to remove redundancy using cd-hit (Li and Godzik, 2006), with an identity cutoff of 90%, which resulted in unique gene sets. Reads from each sample were mapped to obtain their unique gene set. “Gene abundance” in each sample was the number of reads mapped to each gene sequence divided by the gene length. The percentage of gene abundance in the whole gene catalog was called the “relative abundance.” Diamond (Buchfink et al., 2015) was introduced to gene alignment and gene annotation in the National Center for Biotechnology Information-NR database with an e-value cutoff of 10-fold of the minimum value. Based on the alignment results, the algorithm of nearest common ancestors was taken into account for species annotation on genes. We annotated genes to species with in-house Perl scripts. The abundance of species in each sample was defined as the sum of gene abundance annotated to the same species. Functional classification was carried out by mapping to the Kyoto Encyclopedia of Genes and Genomes (KEGG) protein database and Clusters of Orthologous Groups of Proteins (COG) database (Tatusov et al., 2000) using KEGG Orthology-Based Annotation System (KOBAS) (Xie et al., 2011) and Diamond, respectively. Kruskal–Wallis analysis (Segata et al., 2011) was applied for classification and analyses of differentially expressed genes among dissimilar groups (p < 0.05, fdr < 0.05). In addition, principal component analysis (PCA) was used to analyze and visualize the sample distribution and discrimination among different disease groups.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.