Modeling

LF Lucas M. Fleuren
MT Michele Tonutti
DB Daan P. de Bruin
RL Robbert C. A. Lalisang
TD Tariq A. Dam
DG Diederik Gommers
OC Olaf L. Cremer
RB Rob J. Bosman
SV Sebastiaan J. J. Vonk
MF Mattia Fornasa
TM Tomas Machado
NM Nardo J. M. van der Meer
SR Sander Rigter
EW Evert-Jan Wils
TF Tim Frenzel
DD Dave A. Dongelmans
RJ Remko de Jong
MP Marco Peters
MK Marlijn J. A. Kamps
DR Dharmanand Ramnarain
RN Ralph Nowitzky
FN Fleur G. C. A. Nooteboom
WR Wouter de Ruijter
LU Louise C. Urlings-Strop
ES Ellen G. M. Smit
DM D. Jannet Mehagnoul-Schipper
TD Tom Dormans
CJ Cornelis P. C. de Jager
SH Stefaan H. A. Hendriks
EO Evelien Oostdijk
AR Auke C. Reidinga
BF Barbara Festen-Spanjer
GB Gert Brunnekreef
AC Alexander D. Cornet
WT Walter van den Tempel
AB Age D. Boelens
PK Peter Koetsier
JL Judith Lens
SA Sefanja Achterberg
HF Harald J. Faber
AK A. Karakus
MB Menno Beukema
RE Robert Entjes
PJ Paul de Jong
TH Taco Houwert
HH Hidde Hovenkamp
RL Roberto Noorduijn Londono
DQ Davide Quintarelli
MS Martijn G. Scholtemeijer
AB Aletta A. de Beer
request Request a Protocol
ask Ask a question
Favorite

Up to three observations were constructed for each patient depending on their length of stay, averaging the available predictor values in the 24 h preceding 1 day, 7 days, and 14 days of IMV. This process is illustrated in Additional file 1: Fig. S2. ICU mortality was modeled as a classification problem with a decision tree, logistic regression and XGBoost algorithm to investigate the performance of both simple and complex linear and non-linear models. Ventilator and ICU-free days were treated as regression problems with a Lasso and Ridge linear model, as well as an XGBoost regressor. For every outcome and every algorithm, a single model was fit on data points from all days.

Overall model performance was evaluated using the area under the receiver operating characteristic (AUROC), average precision, calibration loss, and Brier score. A nested cross-validation was performed for hyperparameter optimization and to assess performance on the whole dataset. This approach first splits the data into five outer holdout sets with 20% of the data each. For each holdout set, the remaining 80% of the data were used to fit and optimize a model via fivefold cross-validation and a randomized search over a predefined range of hyperparameter values. Observations belonging to the same patient were always kept in the same set to avoid leakage of information. A graphical representation of the process is shown in Additional file 1: Fig. S3.

For each outer holdout set, data imputation, standardization and automated feature selection were performed independently on each train set and then applied to the test set. Missing data were imputed using median imputation for simplicity and predictors were standardized to have a mean of 0 and a standard deviation of 1. A Lasso regression was used for automatic feature selection [17], and its L1 regularization term was optimized together with the classifiers’ hyperparameters. The best-performing estimator from each inner cross-validation was then used to predict the performance on the corresponding holdout test set. The overall performance resulted from the average performance of all outer folds.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A