According to a study by Mayr et al., machine learning prediction for the chemical database has a variety of potential biases [22]. Firstly, compound series bias easily occurs because databases have highly similar compounds that share common scaffolds [25]. Hence, a prediction model that is optimized for a specific scaffold may overestimate the performance, which is not suitable for activity prediction of novel drugs. Secondly, hyperparameter selection that is optimized for a specific algorithm is also problematic because it does not allow a fair comparison between methods. To overcome such biases, we adopted nested cluster cross validation for comparison between various DNNs [35]. Details of the nested cluster cross validation is described in Supplementary File S-2.
ROC-AUC is one of the most common metrics to evaluate the performance of binary classification (active or inactive). However, based on the purpose of multi-task learning, validating the ROC curve across all targets cannot represent the prediction accuracy for individual targets. Some of the tasks may suffer from low prediction power while other tasks may display overwhelming accuracy. To check such skewed performance, we used target-AUC, which indicates the average ROC-AUC over validation fold for each task [22]. Validating the mean and variance of target-AUC across targets more precisely represents the performance of activity prediction for the respective targets. Moreover, the robustness of multi-task architectures can be evaluated by measuring how many tasks outperform single-task learning [23].
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.