Evaluating PPIO

Mansheng Li; Qiang He; Chunyuan Yang; Jie Ma; Fuchu He; Tao Chen; Yunping Zhu

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Evaluating PPIO

ML Mansheng Li

QH Qiang He

CY Chunyuan Yang

JM Jie Ma

FH Fuchu He

TC Tao Chen

YZ Yunping Zhu

This method is extracted from research article: BMC Genomics, Nov 2021

The protein-protein interaction ontology: for better representing and capturing the biological context of protein interaction

DOI: 10.1186/s12864-021-07827-4

Request a Protocol

Ask a question

Favorite

To assess PPIO for its structure and functional features, we first applied it to capture PPI annotations from literature, which was conducted on an open standard corpus, annotating extracted PPIs based on PPIO and assessing the performance. Then, we employed PPIO to navigate PPI information.

Annotating PPIs based on PPIO. To annotate extracted PPIs, a PPIO-based approach was proposed to identify and assign PPIO terms that exist in the same sentence with the target PPI. The co-occurrence of PPI and PPIO term in one sentence suggests that the term represents a type of annotations of the PPI.

Corpus and preprocessing. A corpus named “BioCreAtIvE-PPI” [26] (See Table S3 in Additional file 3) was used to evaluate the efficacy of PPIO-based annotation extraction. This dataset originated from the BioCreAtIvE Task [27] corpus. A total of 173 sentences, which contained 255 interactions, were randomly selected from the BioCreAtIvE corpus by the original PPI curator. Based on these sentences which contained at least one PPI, six aspect additional annotations of PPI were curated manually by individual annotators according to the PPIO schema. In total, 71 Roles/Status of interactors, 91 biological processes (BPs), 17 subcellular locations (SCLs), 274 interaction types (ITs), 53 biological functions (BFs) and 43 detection methods (DMs) of PPIs were labeled on the original “BioCreAtIvE-PPI” corpus. This innovate curated corpus (See Table S4 in Additional file 4) was then used in the evaluation procedure. In order to create the reference corpus, the annotators were asked to keep in mind the breadth and depth of PPIO and to consider not only the superclass concepts but also their corresponding sub-class concepts as well as their synonyms for annotation.

Assigning annotations to related PPIs based on PPIO. We used the terms of PPIO as a dictionary for PPI annotation extraction. A PPIO-based approach which consists of three steps was proposed to accomplish the annotation task. First, a string matching algorithm was applied to recognize all the case-insensitive names and synonyms of the PPIO terms in sentences containing PPIs. Then, in the case of multiple matches, the longest match was selected. For instance, when the terms “regulation” and “regulation of transcription” were both identified, “regulation of transcription” was selected. Finally, the results were validated manually and the performance of the PPIO-based approach was evaluated using the curated corpus described above. The evaluation process focused on the performance comparison between the automatically assigned corpus and the manually curated corpus. Three commonly used features, i.e., precision, recall and F-score, were used to measure the performance of the PPI annotation extraction:

$Precision = \frac{True Positive}{True Positive + False Positive} \dots \dots \dots \dots \dots \dots (1)$

$Recall = \frac{True Positive}{True Positive + False Negetive} \dots \dots \dots \dots \dots (2)$

$F-score = \frac{2 \times Precision \times Recall}{Precision + Recall} \dots \dots \dots \dots (3)$

where true positive is the number of entities that were found by the PPIO-based text mining system, and those matched the annotations in the curated corpus, false positive is the number of entities that were automatically assigned by the PPIO-based text mining system but could not be matched to any annotations in the manually curated corpus, and false negative is the number of entities that were not found by the PPIO-based approach when compared with the manually curated annotations. Higher precision, recall and F-score indicate high performance. Further details of evaluation material and methods are provided in Additional file 13.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol