Classification task

CC Colum Crowe
JB John Barton
BO Brendan O’Flynn
ST Salvatore Tedesco
ask Ask a question
Favorite

Similarly, the dataset was separated into male/female subsets, each of which was randomly split into training (4,980/6,348 subjects), validation (996/1,270 subjects) and test sets (1,246/1,587 subjects), accounting for approximately 60%, 20%, and 20% of the total sample size, respectively. Stratification was implemented when splitting training, validation, and test sets so that all had the same ratio of weak to healthy hand grip strength labels (approx. 16%/84% using a T-score of -2 and 46%/54% using a T-score of -1). Synthetic Minority Over-sampling Technique (SMOTE) was carried out on the training set. Additionally, all the features were standardized before feature selection.

The features were fed into a supervised-based classifier developed in Python 3 (Python Software Foundation, Delaware, US). The classifier considered in this analysis was a bagging ensemble model with additional balancing using XGBoost as the base estimator. Balanced accuracy was used as the metric to quantify the goodness-of-fit comparing the predictions of the classifier with the real labels. A grid search was employed on the training set to attain optimal values for the model hyper-parameters. Model fitting and feature selection (based on Select K Best using f_classification as a scoring function) were deployed simultaneously. For each combination of hyper-parameters’ values, a 5-fold cross-validation was carried out on the training data and the related balanced accuracy was obtained. The combination of hyper-parameters that returned the highest balanced accuracy was considered as the optimum and the selected model was evaluated on the validation set to prove its generalizability. Consecutively, training and validation sets were merged into a single new training set, the model was re-trained (with the optimal hyper-parameters and features selected), and the balanced accuracy was obtained for the test dataset. This procedure was carried out for the set of binary classification labels defined using a T-score of -2 and subsequently repeated for those defined using a T-score of -1.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A