Inter‐database identifier mapping (ID mapping) of CpDAA residues between UniProtKB and ENSPs

MP Maria F Palafox
HD Heta S Desai
VA Valerie A Arboleda
KB Keriann M Backus
request Request a Protocol
ask Ask a question
Favorite

Two methods were used to cross‐reference stable or versioned protein IDs between UniProtKB and five Ensembl releases:

Ensembl mapping: Ensembl mapping (“xref”) files from the five releases studied (v85, v92, v94, v96, and v97) were used for inter‐database identifier mapping. Ensembl gene (ENSG), transcript, and associated protein IDs cross‐referencing the curated set of 3,953 CpD UniProtKB stable IDs were extracted and grouped by single or multi‐isoform status of the cross‐referenced UniProtKB entry. Ensembl IDs cross‐referencing UniProt CpD protein IDs were then used to filter the five Ensembl release‐specific peptide FASTA files for associated protein sequences.

UniProtKB isoform‐specific mapping: UniProtKB ID mapping (idmapping.dat) file from August 01, 2018, release was used for inter‐database identifier mapping. Ensembl IDs cross‐referenced by the UniProtKB canonical protein isoform IDs for multi‐isoform entries and stable IDs for single isoform entries were pooled and used to filter release‐specific Ensembl peptide FASTA files for associated protein sequences.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A