Classification task

Colum Crowe; John Barton; Brendan O’Flynn; Salvatore Tedesco

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Classification task

CC Colum Crowe

JB John Barton

BO Brendan O’Flynn

ST Salvatore Tedesco

This method is extracted from research article: Aging Clin Exp Res, May 2024

Association between wrist-worn free-living accelerometry and hand grip strength in middle-aged and older adults

DOI: 10.1007/s40520-024-02757-z

Ask a question

Favorite

Similarly, the dataset was separated into male/female subsets, each of which was randomly split into training (4,980/6,348 subjects), validation (996/1,270 subjects) and test sets (1,246/1,587 subjects), accounting for approximately 60%, 20%, and 20% of the total sample size, respectively. Stratification was implemented when splitting training, validation, and test sets so that all had the same ratio of weak to healthy hand grip strength labels (approx. 16%/84% using a T-score of -2 and 46%/54% using a T-score of -1). Synthetic Minority Over-sampling Technique (SMOTE) was carried out on the training set. Additionally, all the features were standardized before feature selection.

The features were fed into a supervised-based classifier developed in Python 3 (Python Software Foundation, Delaware, US). The classifier considered in this analysis was a bagging ensemble model with additional balancing using XGBoost as the base estimator. Balanced accuracy was used as the metric to quantify the goodness-of-fit comparing the predictions of the classifier with the real labels. A grid search was employed on the training set to attain optimal values for the model hyper-parameters. Model fitting and feature selection (based on Select K Best using f_classification as a scoring function) were deployed simultaneously. For each combination of hyper-parameters’ values, a 5-fold cross-validation was carried out on the training data and the related balanced accuracy was obtained. The combination of hyper-parameters that returned the highest balanced accuracy was considered as the optimum and the selected model was evaluated on the validation set to prove its generalizability. Consecutively, training and validation sets were merged into a single new training set, the model was re-trained (with the optimal hyper-parameters and features selected), and the balanced accuracy was obtained for the test dataset. This procedure was carried out for the set of binary classification labels defined using a T-score of -2 and subsequently repeated for those defined using a T-score of -1.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol