SUPERLEARNER PREDICTION

SC S. Ariane Christie
AH Alan E Hubbard
RC Rachael A Callcut
MH Morad Hameed
FD Fanny Nadia Dissak-Delon
DM David Mekolo
AS Arabo Saidou
AM Alain Chichom Mefire
PN Pierre Nsongoo
RD Rochelle A. Dicker
MC Mitchell Jay Cohen
CJ Catherine Juillard
request Request a Protocol
ask Ask a question
Favorite

SL is a previously-validated ensemble machine learning algorithm which has been described in detail in prior publications (18). SL can be downloaded as a package within the R coding language (25). Both the R software and the SL package are open-source and can be accessed without charge to the user (https://cran.r-project.org/ and https://cran.r-project.org/web/packages/SuperLearner/index.html, respectively). This means that this technology is widely accessible to clinicians and researchers in both high-income and LMIC contexts.

Rather than pre-specifying a single statistical approach, SL simultaneously investigates multiple algorithms ranging from simple logistic regression to highly complex machine learning (e.g., neural nets) in order to optimally predict outcomes of interest from complex datasets. SL uses cross-validation to tailor a weighted (convex) combination of learners to optimize prediction on new data from the same data-generating distribution. Embedded cross-validation eliminates the risk of over-fitting (18).

In this study SL was applied to all admission variables of the US, SA, and Cameroonian cohorts to generate setting- specific prediction of hospital mortality. We used a set of algorithms including: logistic and linear regression, generalized additive models with various levels of smoothing (26), random forest (27), lasso (28) and systems-based on sieves of parametric models (e.g. polyclass). To report the ability of the resulting SL fit to future data, we estimated the cross-validated area under the curve (AUC) of receiver-operator characteristic curves (ROC) as well as using cv-AUC as the objective function, so the procedure optimizes prediction towards more clinically-relevant measures of performance (in this case, a function of specificity and sensitivity). SL prediction and cross-validated risk was used to evaluate performance of each model (29).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A