Logistic regression to predict academic success from applicant data

Tal Baron; Robert I. Grossman; Steven B. Abramson; Martin V. Pusic; Rafael Rivera; Marc M. Triola; Itai Yanai

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Logistic regression to predict academic success from applicant data

TB Tal Baron

RG Robert I. Grossman

SA Steven B. Abramson

MP Martin V. Pusic

RR Rafael Rivera

MT Marc M. Triola

IY Itai Yanai

This method is extracted from research article: PLoS One, Jan 2020

Signatures of medical student applicants and academic success

DOI: 10.1371/journal.pone.0227108

Request a Protocol

Ask a question

Favorite

Finally, we asked if we could demonstrate that the signature is an important feature by determining if it would improve the prediction of academic success. Specifically, to assess whether the four clusters explained variability in the success index beyond that achieved with the raw data, we determined if predicting the success index from the base application features is differentially more accurate when the signature of each student is added to the model. For this we used the following two approaches for predicting the success index: 1. using the application features alone, and 2. including also the signature assignments. To produce a valid and robust predictive model we further compressed the success index into a three-level success score. This was done by scoring low performing students (original success index scores of 0, 1 and 2) with 0, medium performing students (original success index scores of 3 and 4) with 1, and high performing students (original success index scores of 5, 6 and 7) with 2. Compression is helpful in that every level of the success score relates to a higher number of samples, which leads to decreased error rate of the classifier [26]. We used a 3-fold cross validation procedure to find an optimized logistic regression model, and fitted it on the features of the training group, using the compressed success score as the target variable. We deployed this model on the test group to predict the compressed success score of these students. In parallel, we fitted a second optimized logistic regression model as above but also include the signature assignment as an additional feature. To infer the cluster of the students in the test group, we matched the nearest K-means cluster delineated by the training group. The Likelihood Ratio (LR) test was performed on the two fitted models using the training data to compare the goodness of fit of the two models.

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol