Gene content matrix (Figure S1A)

JT Javier Tamames
PS Pablo D. Sánchez
PN Pablo I. Nikel
CP Carlos Pedrós-Alió
request Request a Protocol
ask Ask a question
Favorite

A set of 1384 completely sequenced prokaryotic genomes was used as the source of gene content information. Clusters of orthologous groups (COGs; Tatusov et al., 1997) were used as a source of functional annotations for genomes. To obtain a set of genomes annotated with comparable completeness, we analyzed the distribution of the number of COGs vs. genomic size in the 1384 genomes. A direct relationship was found between these two variables for most genomes (Figure S2 in the Supplementary Material), and we removed 98 genomes that did not conform to the general trend, obtaining a final set of 1286 genomes belonging to 992 different species. An initial gene content matrix was derived, with species in rows and the abundance of the 4873 COGs in each species as columns. For species with several strains, we averaged the abundance of each COG across all the strains.

Genus was chosen as the working rank because the assignment of environmental sequences to species could not be resolved in many instances, and because many species have been observed rarely in natural samples. Mapping species to genera both facilitates the classification and reduces the number of taxonomic units to work with. To generate a gene content matrix at the genus level, the abundance of COGs for all the species belonging to each genus was averaged. Thus, we obtained a gene content matrix of 4873 COGs in 503 genera. We were able to generate also sub-matrices for particular subsets of genes, like those belonging to particular metabolic pathways or functional categories, simply by selecting the COGs involved in such processes. We also recorded several phenotypic (acidophilic, halophilic, psychrophilic, termophilic, alkalophilic) and metabolic characteristics (phototrophic, nitrate reducer, sulfate reducer, methanogen, and reduced, streamlined genomes) for the taxa in this study, according to the literature.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A