Modeling

Lucas M. Fleuren; Michele Tonutti; Daan P. de Bruin; Robbert C. A. Lalisang; Tariq A. Dam; Diederik Gommers; Olaf L. Cremer; Rob J. Bosman; Sebastiaan J. J. Vonk; Mattia Fornasa; Tomas Machado; Nardo J. M. van der Meer; Sander Rigter; Evert-Jan Wils; Tim Frenzel; Dave A. Dongelmans; Remko de Jong; Marco Peters; Marlijn J. A. Kamps; Dharmanand Ramnarain; Ralph Nowitzky; Fleur G. C. A. Nooteboom; Wouter de Ruijter; Louise C. Urlings-Strop; Ellen G. M. Smit; D. Jannet Mehagnoul-Schipper; Tom Dormans; Cornelis P. C. de Jager; Stefaan H. A. Hendriks; Evelien Oostdijk; Auke C. Reidinga; Barbara Festen-Spanjer; Gert Brunnekreef; Alexander D. Cornet; Walter van den Tempel; Age D. Boelens; Peter Koetsier; Judith Lens; Sefanja Achterberg; Harald J. Faber; A. Karakus; Menno Beukema; Robert Entjes; Paul de Jong; Taco Houwert; Hidde Hovenkamp; Roberto Noorduijn Londono; Davide Quintarelli; Martijn G. Scholtemeijer; Aletta A. de Beer

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Modeling

LF Lucas M. Fleuren

MT Michele Tonutti

DB Daan P. de Bruin

RL Robbert C. A. Lalisang

TD Tariq A. Dam

DG Diederik Gommers

OC Olaf L. Cremer

RB Rob J. Bosman

SV Sebastiaan J. J. Vonk

MF Mattia Fornasa

TM Tomas Machado

NM Nardo J. M. van der Meer

SR Sander Rigter

EW Evert-Jan Wils

TF Tim Frenzel

DD Dave A. Dongelmans

RJ Remko de Jong

MP Marco Peters

MK Marlijn J. A. Kamps

DR Dharmanand Ramnarain

RN Ralph Nowitzky

FN Fleur G. C. A. Nooteboom

WR Wouter de Ruijter

LU Louise C. Urlings-Strop

ES Ellen G. M. Smit

DM D. Jannet Mehagnoul-Schipper

TD Tom Dormans

CJ Cornelis P. C. de Jager

SH Stefaan H. A. Hendriks

EO Evelien Oostdijk

AR Auke C. Reidinga

BF Barbara Festen-Spanjer

GB Gert Brunnekreef

AC Alexander D. Cornet

WT Walter van den Tempel

AB Age D. Boelens

PK Peter Koetsier

JL Judith Lens

SA Sefanja Achterberg

HF Harald J. Faber

AK A. Karakus

MB Menno Beukema

RE Robert Entjes

PJ Paul de Jong

TH Taco Houwert

HH Hidde Hovenkamp

RL Roberto Noorduijn Londono

DQ Davide Quintarelli

MS Martijn G. Scholtemeijer

AB Aletta A. de Beer

This method is extracted from research article: Intensive Care Med Exp, Jun 2021

Risk factors for adverse outcomes during mechanical ventilation of 1152 COVID-19 patients: a multicenter machine learning study with highly granular data from the Dutch Data Warehouse

DOI: 10.1186/s40635-021-00397-5

Request a Protocol

Ask a question

Favorite

Up to three observations were constructed for each patient depending on their length of stay, averaging the available predictor values in the 24 h preceding 1 day, 7 days, and 14 days of IMV. This process is illustrated in Additional file 1: Fig. S2. ICU mortality was modeled as a classification problem with a decision tree, logistic regression and XGBoost algorithm to investigate the performance of both simple and complex linear and non-linear models. Ventilator and ICU-free days were treated as regression problems with a Lasso and Ridge linear model, as well as an XGBoost regressor. For every outcome and every algorithm, a single model was fit on data points from all days.

Overall model performance was evaluated using the area under the receiver operating characteristic (AUROC), average precision, calibration loss, and Brier score. A nested cross-validation was performed for hyperparameter optimization and to assess performance on the whole dataset. This approach first splits the data into five outer holdout sets with 20% of the data each. For each holdout set, the remaining 80% of the data were used to fit and optimize a model via fivefold cross-validation and a randomized search over a predefined range of hyperparameter values. Observations belonging to the same patient were always kept in the same set to avoid leakage of information. A graphical representation of the process is shown in Additional file 1: Fig. S3.

For each outer holdout set, data imputation, standardization and automated feature selection were performed independently on each train set and then applied to the test set. Missing data were imputed using median imputation for simplicity and predictors were standardized to have a mean of 0 and a standard deviation of 1. A Lasso regression was used for automatic feature selection [17], and its L1 regularization term was optimized together with the classifiers’ hyperparameters. The best-performing estimator from each inner cross-validation was then used to predict the performance on the corresponding holdout test set. The overall performance resulted from the average performance of all outer folds.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol