Validation and testing

JD Janani Durairaj
EM Elena Melillo
HB Harro J. Bouwmeester
JB Jules Beekwilder
DR Dick de Ridder
AD Aalt D. J. van Dijk
request Request a Protocol
ask Ask a question
Favorite

Three validation schemes are used to test a classifier.

Random Split: A random five-fold cross-validation with 80%-20% train-test split.

Genus Split: A scheme in which cases from 65 genera are used for training and the rest for testing, repeated 10 times with different sets.

Clade Split: All dicot STSs are used for training and monocot and conifer STSs for testing.

Three different metrics are used to measure the performance of each classifier, using the definitions of TP and TN as the number of nerolidyl cation-specific synthases and number of farnesyl cation-specific synthases predicted correctly at a certain threshold of predicted probability, and FP and FN as the number of nerolidyl cation-specific synthases and number of farnesyl cation-specific synthases predicted incorrectly at a certain threshold. All metrics are calculated using the scikit-learn Python library [47].

Balanced accuracy (bAcc): 12(TPTP+FN+TNTN+FP) at a threshold of 0.5.

Area Under the Receiver Operating Characteristic Curve (AUC): Calculated as the area under the plot of the fraction of TP out of the total number of nerolidyl cation-specific synthases vs. the fraction of FP out of the total number of farnesyl cation-specific synthases, at various threshold settings.

Area Under the Precision-Recall Curve (AUPRC): Calculated as the area under the plot of the precision (TP/(TP + FP)) vs. the recall (TP/(TP + FN) at various threshold settings.

42 newly characterized synthases from literature (listed in S1 Table) are used as the final independent test set.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A