Orthology identification

BN Behrad Vahdati Nia
CK Christine Kang
MT Michelle G. Tran
DL Deborah Lee
SM Shin Murakami
request Request a Protocol
ask Ask a question
Favorite

In order to find the appropriate ortholog counterparts of AD genes in C. elegans, we performed matching registry tests using WormBase (www.wormbase.org; accessed on 2016; Stein et al., 2001). WormBase is a consortium that contains biological and genomic information on C. elegans and other related nematodes. The database is regularly updated based on new research submissions and provides lists of homolog genes between C. elegans and other species. We experienced difficulty using the default search menu because it did not consistently provide the genes we were looking for. For this reason, we performed multiple searches using gene names, protein names, and locus IDs to generate a complete list. As an additional measure to avoid missing any genes, we also used BLASTP (protein-protein Basic Local Alignment Search Tool) match results from WormBase to confirm that the procured gene similarity default browser matched the registry results based on gene data (Wheeler et al., 2008).

In order to validate the completeness of WormBase's results, OrthoList was the second database we used to obtain a second list of orthologous AD gene counterparts in C. elegans. We also chose OrthoList as a control database because it contains a fixed number of genes (Aug. 2013) (Shaye and Greenwald, 2011). OrthoList is a database compiled from the meta-analysis of four unique programs which predict orthologous genes (Shaye and Greenwald, 2011; accessed on 2016). The four programs are: Ensembl Compara, InParanoid, Homologene, and OrthoMCL (described separately below). We then compared the matched registry test results produced from WormBase against those of OrthoList and identified the AD genes matched by both programs. The list of genes generated by OrthoList was also divided based on the number of C. elegans genes associated with each human AD gene. In cases where more than one C. elegans genes was associated with a single human gene, they would be labeled as orthologs with multiple WormBase IDs; if only a single C. elegans gene was associated with a single human gene, it would be labeled as an ortholog with a single WormBase ID.

It uses both sequence level and gene level analysis to obtain data on cross-species, and phylogenetic trees are used to represent such data. Their Protein trees include a protein associated with a specific gene based on NCBI BLAST+ e-values to assess the level of homology. These proteins are then clustered and aligned using different techniques (ensemblgenomes.org; accessed on 2016; Kersey et al., 2016).

It uses NCBI BLAST to calculate and create orthologous groups of two complete proteomes (Remm et al., 2001). Each orthologous group contains two seed orthologs that are determined by two-way best hits between the proteomes. Additional sequences are added to each group based on the closeness of the sequences in the proteomes to the corresponding seed orthologs (inparanoid51.sbc.su.se/cgi-bin/faq.cgi., 2001).

It compares the protein and sequence makeup of different species with either a complete genome or at least 10,000 UniGene entries to create putative homology groups. BLASTP is used to assess the homology of the genes and different species are then divided based on their genomic makeup similarity (www.ncbi.nlm.nih.gov/homologene., 2007).

It uses BLASTP on proteins and computes the percent match of sequences based on their length (Li et al., 2003). A threshold is set for the BLAST results and only matches with e-values of less than 1e-5. Based on these results, possible ortholog, inparalog, and co-ortholog pairs are obtained. Lastly, OrthoMCL is used to cluster these pairs into groups (http://orthomcl.org/orthomcl/., 2014).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A