We compared 2 approaches to predict LOS based on ease of implementation. First, we used a multivariate generalized linear model (GLM) with discharge by 5, 10, and 15 days as the dependent variable, including random effects for each hospital system. We included comorbidities, initial vitals, age, race, need for ICU or ventilator support, maximum O2 requirement, nursing home admission, and inter-hospital transfer as initial potential predictors. Final model features were selected using PC-Simple with maximum condition set size of 3.27
The LOS prediction from the regression model was compared against a predictive model generated by a random forest (RF), which reduces potential bias from errors in assumption regarding the relationship and interaction of factors.28,29 For each RF model, we generated variable importance plots by Gini impurity index. For simplicity, we report only the top 20 variables by importance. We used area under the ROC curve (AUROC) to compare accuracy of both models. For each, we used test sets of 5-fold cross-validation with 95% confidence intervals (CIs) calculated by 200 bootstrap samples. AUROC for both models were compared for 5, 10, and 15-day thresholds as well as mortality. Calibration curves for each model are provided in the Supplementary text.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.