Here we upload datasets used to generate the main figures of a paper, "Machine learning enables prediction of metabolic system evolution in bacteria" (10.1126/sciadv.adc9130). The zip file contains two files: a phylogenetic tree and a gene presence/absence information for every extant and ancestral species.
/Bio-protocol-data/Bacterial_tree.nwk : A Newick format file of the bacterial reference phylogeny extracted from GTDB reference phylogeny (r89).
/Bio-protocol-data/KEGGOG_Species_PresenceAbsence.list : A TSV file of the presence/absence profile of every ortholog group (OG) for every tip node (extant species) and every internal node (ancestors) of `/Bio-protocol-data/Bacterial_tree.nwk`. This file has one row for each pair of an OG and an internal/tip node. Every row's first, second, and third columns indicate the OG name (KEGG ortholog group), node name, and the presence/absence state, respectively. The presence/absence state is represented as 1 (present) or 0.5 (uncertain; for ancestors). If there is no row for a pair of an OG and a node in the phylogeny, the OG is absent in the genome corresponding to the node.
Readers should cite both the Bio-protocol preprint and the original research article where this protocol was used:
Konno, N and Iwasaki, W(2023). A reference phylogenetic tree and a gene presence/absence profile. Bio-protocol Preprint. bio-protocol.org/prep2277.
Konno, N. and Iwasaki, W.(2023). Machine learning enables prediction of metabolic system evolution in bacteria. Science Advances 9(2). DOI: 10.1126/sciadv.adc9130
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this
article to respond.
0/150
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.
Spinning
Post a Question
0 Q&A
Spinning
This protocol preprint was submitted via the "Request
a Protocol" track.