We downloaded genome sequences from the Unified Human Gastrointestinal Genomes [2] (UHGG) at http://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes as of September 2019. The UHGG database is a very large collection of gut microbial whole genome sequences, which are originally from both isolate assemblies and metagenome-assembled genomes (MAGs). The inclusion of MAGs from diverse human populations and geographic locations is critical for capturing natural genetic variation within human gut species. From the UHGG database, we selected 146 species, each with 200 or more high-quality (completeness > = 90% and contamination rate < = 5%) whole genome sequences, which accounted for a total of 109,365 genomes. Twenty-nine of them had more than 1000 genomes, and Escherichia coli_D (species id: 102506) had the most genomes (n = 6645).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.