Assessing identifier multi‐mapping between UniProtKB and Ensembl

MP Maria F Palafox
HD Heta S Desai
VA Valerie A Arboleda
KB Keriann M Backus
request Request a Protocol
ask Ask a question
Favorite

From Method A ID mapping, the total number of unique Ensembl IDs (versioned and stable) from five releases that cross‐reference CpD UniProt proteins was calculated for each UniProtKB ID. The mean number of unique multi‐mapping Ensembl IDs per CpD UniProtKB protein ID was calculated for single and multi‐isoform entries. Sequence identity was checked for all cross‐referenced Ensembl and UniProtKB proteins and marked by an additional Boolean column (“False” for non‐identical and “True” for identical Ensembl‐UniProt canonical proteins; see GitHub for python script). From Method B ID mapping, as with analysis for Method A, identifier multi‐mapping was calculated for single and multi‐isoform UniProtKB entries and sequence identity of cross‐reference proteins was marked by an additional Boolean column. Student's unpaired t‐test was used to assess all ID multi‐mapping differences between versioned and stable ENSG, transcript, and protein IDs cross‐referencing our curated set of 3,953 CpD UniProt protein IDs found in all Ensembl release‐specific mapping files.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A