Model development

JB Jennifer A. Bishop
HJ Hamza A. Javed
Re Rasheed el-Bouri
TZ Tingting Zhu
TT Thomas Taylor
TP Tim Peto
PW Peter Watkinson
DE David W. Eyre
DC David A. Clifton
request Request a Protocol
ask Ask a question
Favorite

In this study, four supervised ML classifiers were considered. Random forest (RF) and support vector machine (SVM) models, which have previously been shown to give good performance [12, 13, 15, 23, 24] were compared with deep neural networks (DNN) in the form of multilayer perceptron (MLP) models. Logistic regressor (LR) models were also included to serve as a baseline, being a strong comparator from medical statistics. The different classifiers were assessed on their ability to predict whether an inpatient would be discharged within the next 24 hours and the probability scores given by the classifier were used to rank patients in order of their likelihood of discharge.

Model hyperparameters were selected through a nested K-fold cross-validation scheme on the De0 dataset, where the outer- and inner-loops consisted of 5 and 3 folds respectively. The 5-fold scheme partitioned the data into training and evaluation folds, whilst the additional 3-fold partition was applied in an inner-loop on the training set folds, to create a training-validation set to assess performance of different hyperparameter choices. A grid-search approach was used to test different hyperparameter combinations, with the combination giving highest average AUROC across all validation folds eventually selected for all models. The hyperparameter values determined and used are detailed in Table 3.

Table detailing the LR, RF, SVM and DNN models’ hyperparameters.

Domain knowledge and prior literature were used to determine which information within the dataset would be most useful for predicting patient discharge. Handcrafted features used to train the models included: age, day of the week, procedures information, ICU information and statistical representations of the National Early Warning Score (NEWS) metric [29], which encodes vital signs information, binned into 24-hour periods. Temporal features such as ‘time elapsed since procedure’, ‘time elapsed since ICU discharge’ and features relating to NEWS were populated in ‘real-time’, only being included into the models for which the information would be available. A maximum of 79 features were engineered, the full list of which is summarised in Table 4.

A table of all features engineered from the data within the EHR. All features were included in the LR, RF and DNN models. Feature selection was carried out to determine the features to be included in the SVM models from this set. Additional detail about each feature’s data type can be found in Appendix B in (S1 File).

For operational purposes in hospital, it is preferable for a decision support tool to be able to make predictions for all patient groups in the hospital at any given time. Patient diagnosis is typically classified using international classification of disease (ICD) or “Clinical Classifications Software” (CCS) groupings [30], both of which contain too many diagnostic groups to be easily included as ML features directly. As stated earlier, most prior studies restrict themselves to a handful of patient diagnostic categories or a specific patient type. In this study, to directly capture the effects of a patient’s diagnostic category on LOS, features containing the historic mean and variance of the LOS of patients within the same diagnostic category as the patient-under-test were developed. The historic mean and variance of LOS for a particular CCS category were calculated using the training dataset. These mean and variance values were then assigned to patients of the same CCS category in both the training and the test datasets. For patients in the test set with an unseen CCS category, the average of all diagnostic categories was assigned for each feature. Under the present hospital processes, diagnostic categories are assigned and recorded on a patient’s discharge. As such, the information used in this study can be thought of as a proxy for the working diagnosis assigned by clinicians during a patient’s stay. If implemented as a decision support tool, suspected CCS category could be recorded by clinicians and used within the models in real-time.

For the SVM models, which are particularly sensitive to the inclusion of features with low predictive value, feature selection techniques were applied. Spatially Uniform RelieF (SURF) [31] feature selection algorithm was used to select features, as we found it to be the most robust against white noise features and to be one of the most consistent at picking similar sets of features across 3-fold cross-validation in a comparison between feature selection algorithms. This algorithm uses the proximity of samples in feature space to describe how feature interactions relate to the sample’s class. The normalised scores from running the SURF feature selection algorithm over the engineered features were generated (Fig 2). The detailed methodology of running this algorithm can be found in Appendix C in (S1 File). For the other non-SVM ML models, all features as described in Table 4 were used.

The normalised scores resulting from our feature selection methodology, using the SURF feature selection algorithm, for each feature in the datasets. The features on the x-axis of this plot are summarised in Table 4.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A