Generation of the Malaspina Gene Database (M-GeneDB)

Silvia G. Acinas; Pablo Sánchez; Guillem Salazar; Francisco M. Cornejo-Castillo; Marta Sebastián; Ramiro Logares; Marta Royo-Llonch; Lucas Paoli; Shinichi Sunagawa; Pascal Hingamp; Hiroyuki Ogata; Gipsi Lima-Mendez; Simon Roux; José M. González; Jesús M. Arrieta; Intikhab S. Alam; Allan Kamau; Chris Bowler; Jeroen Raes; Stéphane Pesant; Peer Bork; Susana Agustí; Takashi Gojobori; Dolors Vaqué; Matthew B. Sullivan; Carlos Pedrós-Alió; Ramon Massana; Carlos M. Duarte; Josep M. Gasol

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Generation of the Malaspina Gene Database (M-GeneDB)

SA Silvia G. Acinas

PS Pablo Sánchez

GS Guillem Salazar

FC Francisco M. Cornejo-Castillo

MS Marta Sebastián

RL Ramiro Logares

MR Marta Royo-Llonch

LP Lucas Paoli

SS Shinichi Sunagawa

PH Pascal Hingamp

HO Hiroyuki Ogata

GL Gipsi Lima-Mendez

SR Simon Roux

JG José M. González

JA Jesús M. Arrieta

IA Intikhab S. Alam

AK Allan Kamau

CB Chris Bowler

JR Jeroen Raes

SP Stéphane Pesant

PB Peer Bork

SA Susana Agustí

TG Takashi Gojobori

DV Dolors Vaqué

MS Matthew B. Sullivan

CP Carlos Pedrós-Alió

RM Ramon Massana

CD Carlos M. Duarte

JG Josep M. Gasol

This method is extracted from research article: Commun Biol, May 2021

Deep ocean metagenomes provide insight into the metabolic architecture of bathypelagic microbial communities

DOI: 10.1038/s42003-021-02112-2

Ask a question

Favorite

All 3,872,410 predicted coding sequences larger than 100 bp from each assembled metagenome were pooled and clustered at 95% sequence similarity and 90% sequence overlap of the smaller sequence using cd-hit-est⁹⁷ v.4.6 using the following options: -c 0.95 -T 0 -M 0 -G 0 -aS 0.9 -g 1 -r 1 -d 0 to obtain 1,115,269 non-redundant gene clusters (from now on referred simply as genes). These gene clusters were aligned to UniRef100⁹⁸ (release 2019-10-16) with diamond blastx⁹⁹ (v0.9.22; e-value 0.0001). The least common ancestor taxonomic assignation of UniRef100 best matches was obtained from NCBI’s taxonomy database¹⁰⁰ (release 2020-01-30).

In order to explore the novelty of the M-GeneDB, we clustered it with the 46,775,154 non-redundant sequences from the Tara Oceans Microbial Reference Gene Catalog version 2 (OM-RGC.v2)³⁷ using cd-hit-est-2d⁹⁷ v.4.6 with the following options: -c 0.95 -T 48 -M 256000 -G 0 -aS 0.9 -g 1 -r 1 -d 0 to obtain a final catalog of 47,422,971 genes.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol