The AUC was used to measure the classification performance. Sensitivity, specificity, positive and negative predictive values, and positive and negative likelihood ratios of the deep learning algorithm were calculated for each disease. The optimal cutoff for each of the 3 diseases was calculated in advance using the Youden’s J statistic of the validation set28. If the probabilities for HHD, HCM, or ALCA were smaller than the corresponding optimal cutoff, the diagnosis was “normal”. Otherwise, the highest value among the probabilities for HHD, HCM, and ALCA decided the final diagnosis. Cohen’s coefficient and the confusion matrix were calculated to compare the diagnostic performance between the deep learning algorithm and the expert clinicians29. Diagnostic accuracy based on the confusion matrix was calculated as (true positives + true negatives)/(true positives + true negatives + false positives + false negatives). All statistical analyses were performed using R statistical software version 4.1.1 (The R Foundation for Statistical Computing, Vienna, Austria). p-values < 0.05 were considered statistically significant.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.