A reliable data [1618] is necessary for a robust model. The benchmark datasets constructed by Zhang et al. [15] were used in our work. It contained 17,403 BLPs composed of three species, namely, bacteria, eukaryote, and archaea, which were collected from UniProt (Jul. 2016). Therefore, four benchmark datasets were generated corresponding to a general and three species-specific datasets (bacteria, eukaryote, and archaea). To avoid homology bias and remove redundant sequences from the benchmark datasets, BLASTClust [19] was utilized to cluster all these protein sequences by setting the cutoff of sequence identity at 30%. And then, one protein was randomly picked from each cluster as the representative. Thus, 863 BLPs were obtained as positive samples. Among these BLPs, 748 belong to bacteria, 70 belong to eukaryote, and 45 belong to archaea. Additionally, 7093 nonredundant non-BLPs were collected to construct the negative samples that consist of 4919, 1426, and 748 proteins of bacteria, eukaryote, and archaea, respectively. Moreover, to construct balanced training dataset, 80% of the positive samples and equal number of negative samples were randomly picked out for training model. The rest positive and negative samples were used for independent testing. As a result, the final four benchmark datasets are constructed and summarized in Table 1. All data are available at http://lin-group.cn/server/iBLP/download.html.

The constructed benchmark datasets for BLP prediction.

Note: The content above has been extracted from a research article, so it may not display correctly.



Q&A
Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.



We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.