Here we upload datasets used to generate the main figures of a paper, "Machine learning enables prediction of metabolic system evolution in bacteria" (10.1126/sciadv.adc9130). The zip file contains two files: a phylogenetic tree and a gene presence/absence information for every extant and ancestral species.
- /Bio-protocol-data/Bacterial_tree.nwk : A Newick format file of the bacterial reference phylogeny extracted from GTDB reference phylogeny (r89).
- /Bio-protocol-data/KEGGOG_Species_PresenceAbsence.list : A TSV file of the presence/absence profile of every ortholog group (OG) for every tip node (extant species) and every internal node (ancestors) of `/Bio-protocol-data/Bacterial_tree.nwk`. This file has one row for each pair of an OG and an internal/tip node. Every row's first, second, and third columns indicate the OG name (KEGG ortholog group), node name, and the presence/absence state, respectively. The presence/absence state is represented as 1 (present) or 0.5 (uncertain; for ancestors). If there is no row for a pair of an OG and a node in the phylogeny, the OG is absent in the genome corresponding to the node.
The files above were directly used as input files of Evodictor in the study (10.1126/sciadv.adc9130).
Copyright: Content may be subjected to copyright.
How to cite:Readers should cite both the Bio-protocol preprint and the original research article where this protocol was used:
- Konno, N and Iwasaki, W(2023). A reference phylogenetic tree and a gene presence/absence profile. Bio-protocol Preprint. bio-protocol.org/prep2277.
- Konno, N. and Iwasaki, W.(2023). Machine learning enables prediction of metabolic system evolution in bacteria. Science Advances 9(2). DOI: 10.1126/sciadv.adc9130
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this
article to respond.
Post a Question 0 Q&A