The KEGG Orthology (KO) assignment was conducted for each individual genome separately. This was done by a Blastp search of the protein *fasta file for each species against the non-redundant protein NCBI database, with an e-value cut-off of 1e−10. From this blast output, the gene ID and Genbank ID numbers (GI) were retrieved and sorted. The resulting GI numbers were converted to UniProt and then to K numbers subsequently using an in-house ID mapping python script that can be obtained from github.com/dieunelderilus/picoeukaryotes/blob/master/gi_kO_mapper.py. Briefly, this script takes as input a table with gene ID and GI numbers for the considered species and outputs a comma separated table which links each individual gene ID to its corresponding UniProt and K number respectively (GeneID→GI→UniProt ID→K).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.