Descriptive statistics included mean (SD) for continuous variables and proportions for categorical variables. Statistical significance for distribution across grades of IFTA was assessed using 1-way analysis of variance for continuous variables and a 2-sided Pearson χ2 test for categorical variables. Agreement between pathologists’ grading of IFTA was done using weighted (using a square-weighted Cohen κ strategy) Cohen κ values. Performance metrics for the image classification task were precision (synonymous with positive predictive value as used in epidemiology), recall (synonymous with sensitivity), accuracy, and F1 score (which was estimated as the harmonic mean of precision and recall).
Because clinical characteristics such as age, diabetes, hypertension, and the eGFR (derived using the Modification of Diet in Renal Disease equation) are associated with the IFTA grade, we examined whether the combination of the clinical characteristics with DL predictions improved the prediction. For this, we ran a baseline multinomial logistic model that predicted the IFTA class using predicted IFTA class as the independent variable. In the next nested model, we added age, sex, hypertension, diabetes, body mass index, and eGFR as the covariates. Incremental predictive performance of the clinical predictors over that of DL prediction was assessed by comparing likelihood ratio χ2, pseudo R2, and Brier score. To make the alternative model more robust, we also evaluated whether using powerful machine learning algorithms can further improve the IFTA predictions obtained by combining DL predictions with clinical characteristics. For this, we used the package CMA in R statistical software version 4.0.2 (R Project for Statistical Computing) and evaluated the following machine learning methods: component-wise boosting, linear discriminant analyses, diagonal discriminant analysis, partial least squares combined with linear discriminant analysis, feed forward neural network, random forest, and support vector machines. All statistical analyses were conducted in Stata statistical software version 12.0 (StataCorp). A global type I error rate of .05 was used to test statistical significance. Data analysis was performed from December 2019 to May 2020.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.