Profiling of E. coli gene content in metagenomes and genomes

AT Alexander V. Tyakht
AM Alexander I. Manolov
AK Alexandra V. Kanygina
DI Dmitry S. Ischenko
BK Boris A. Kovarsky
AP Anna S. Popenko
AP Alexander V. Pavlenko
AE Anna V. Elizarova
DR Daria V. Rakitina
JB Julia P. Baikova
VL Valentina G. Ladygina
EK Elena S. Kostryukova
IK Irina Y. Karpova
TS Tatyana A. Semashko
AL Andrei K. Larin
TG Tatyana V. Grigoryeva
MS Mariya N. Sinyagina
SM Sergei Y. Malanin
PS Petr L. Shcherbakov
AK Anastasiya Y. Kharitonova
IK Igor L. Khalif
MS Marina V. Shapina
IM Igor V. Maev
DA Dmitriy N. Andreev
EB Elena A. Belousova
YB Yulia M. Buzunova
DA Dmitry G. Alexeev
VG Vadim M. Govorun
request Request a Protocol
ask Ask a question
Favorite

The gene content of E. coli in the gut metagenomes was estimated in the form of a binary vector of the presence/absence of each gene included in the pangenome of the species. A gene was considered present if at least 1 read was mapped to the gene during the mapping of all reads to the reference gene catalogue. Here, to adjust for variation in the sequencing coverage and E. coli relative abundance across the metagenomes, for each metagenome, a random subsampling was simulated such that the total number of reads mapped to the pangenome was 80,000. Metagenomes with a lower number of pangenomic reads or less than 50% coverage of the pangenome were not considered in the analysis of the pangenome and accessory genome. The accessory gene (AG) profile was obtained from the pangenome presence profile by filtering genes corresponding to the core genome. A pangenome and accessory profiles were also produced for genomes in the same format based on the alignment of the genomes against the reference gene catalogue [40] using BLASTn (similarity criterion: > 80% identity for > 80% of the gene length). The pairwise dissimilarity between the AG profiles was calculated using a binary metric (using the function dist in R package stats). Hierarchical clustering was performed using the average method.

During the stage of refining the orthology groups specific to Clade 1, the candidate signature OGs of Clade 1 with an OG with an identical function description detected in ≥20% of the AG profiles in Clade 2 were excluded. Here, the OG similarity score was defined as a product of the percent sequence identity and percent query matching length averaged over all possible pairs of genes between the two groups.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A