Logistic regression to predict academic success from applicant data

TB Tal Baron
RG Robert I. Grossman
SA Steven B. Abramson
MP Martin V. Pusic
RR Rafael Rivera
MT Marc M. Triola
IY Itai Yanai
request Request a Protocol
ask Ask a question
Favorite

Finally, we asked if we could demonstrate that the signature is an important feature by determining if it would improve the prediction of academic success. Specifically, to assess whether the four clusters explained variability in the success index beyond that achieved with the raw data, we determined if predicting the success index from the base application features is differentially more accurate when the signature of each student is added to the model. For this we used the following two approaches for predicting the success index: 1. using the application features alone, and 2. including also the signature assignments. To produce a valid and robust predictive model we further compressed the success index into a three-level success score. This was done by scoring low performing students (original success index scores of 0, 1 and 2) with 0, medium performing students (original success index scores of 3 and 4) with 1, and high performing students (original success index scores of 5, 6 and 7) with 2. Compression is helpful in that every level of the success score relates to a higher number of samples, which leads to decreased error rate of the classifier [26]. We used a 3-fold cross validation procedure to find an optimized logistic regression model, and fitted it on the features of the training group, using the compressed success score as the target variable. We deployed this model on the test group to predict the compressed success score of these students. In parallel, we fitted a second optimized logistic regression model as above but also include the signature assignment as an additional feature. To infer the cluster of the students in the test group, we matched the nearest K-means cluster delineated by the training group. The Likelihood Ratio (LR) test was performed on the two fitted models using the training data to compare the goodness of fit of the two models.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A