request Request a Protocol
ask Ask a question
Favorite

The machine learning analysis was conducted using the scikit-learn library in Python, following a systematic process to ensure rigor and reproducibility. The dataset was initially loaded and preprocessed by removing specific columns deemed irrelevant for the analysis. Feature scaling was performed using MinMaxScaler to normalize the data within the range of 0–1, ensuring that no single feature dominated the machine learning models (Pedregosa et al., 2011; de Amorim et al., 2023).

The sample size for this study was 94 participants, which is larger than previous comparable studies in this field (Sakai et al., 2012; Yamamoto et al., 2020). However, machine learning models analyzing neuroimaging data ideally require 100+ participants to achieve stable feature selection (Vabalas et al., 2019).

To mitigate the potential limitations of our sample size, and to enhance the robustness of the model evaluation, bootstrapping was employed. This involved 100 iterations where, in each iteration, a bootstrap sample of the dataset was created and subsequently split into training (70%) and testing (30%) sets (Huang and Huang, 2023). This technique allows for a more reliable estimation of model performance across multiple subsamples of the data.

Additionally, dimensionality reduction in the form of Feature selection was conducted using LASSO (Least Absolute Shrinkage and Selection Operator) regression with 5-fold cross-validation (Tibshirani, 1996) to balance model complexity with the available sample size. Given the 94 samples in our dataset, we constrained the LASSO to select between 7 and 17 features. This range was chosen based on the standard rule of thumb of having approximately 10 samples for each feature, which helps to prevent overfitting while still capturing important predictors (Friedman et al., 2010). Only the top features of the highest importance were retained for further analysis.

To define the critical decline in driving safety performance, we employed a systematic, data-driven process to determine the optimal percentile threshold. The 15th percentile threshold was selected based on the following steps:

Iterative threshold testing: We evaluated multiple percentile thresholds (10, 15, 20, and 25%) to identify the optimal split for our dataset.

Bidirectional analysis: For each threshold, we created binary groups using both top-X% vs. the rest and bottom-X% vs. the rest of the data.

Model development: We developed Random Forest models for each grouping, using 5-fold cross-validation to ensure robustness.

. Performance comparison: We compared model performance across thresholds using multiple metrics:

Bottom 15%: Accuracy = 0.89, Precision = 0.72, Recall = 0.64, F1-score = 0.62, AUC = 0.85.

Other thresholds: Accuracy = 0.82–0.86, Precision = 0.65–0.70, Recall = 0.58–0.62, F1-score = 0.55–0.60, AUC = 0.78–0.82.

Consistency check: We found that the bottom 15% threshold consistently outperformed other splits across all six Driving Safety Behavior (DSB) categories.

Validation: We used bootstrapping (100 iterations) to validate the stability of our results, finding consistent performance (AUC variation: ± 2%) for the 15% threshold.

While this threshold is not a standard statistical cutoff, it provided the most meaningful and stable separation in our dataset for identifying drivers with potentially critical declines in performance. This data-driven approach, combined with the expertise of driving instructors, offers a balance between statistical rigor and practical relevance in the context of driving safety assessment.

To address the class imbalance that is present in the dataset (14 vs. 84 participants), we employed the Synthetic Minority Over-sampling Technique (SMOTE). This technique oversamples the minority class in the training data to achieve balanced classes, which helps to balance the classes and improve the model’s ability to learn from the underrepresented group (Q. Chen et al., 2022).

To identify the optimal classification algorithm for predicting critical decline in DSP, we conducted a comprehensive comparison of nine machine learning algorithms: Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, k-Nearest Neighbors, Naive Bayes, Support Vector Machine, Neural Network, and AdaBoost. All models were evaluated using 10-fold bootstrapping (n = 280 per classifier), with performance assessed across multiple metrics including accuracy, precision, recall, F1-score, and ROC-AUC. To address the class imbalance in our dataset (14 vs. 84 participants), we applied the Synthetic Minority Over-sampling Technique (SMOTE) during model training. Statistical comparisons between model performances were conducted using ANOVA with post-hoc tests, using Support Vector Machine as the reference classifier. The Random Forest classifier was ultimately selected based on its superior performance across these metrics.

While we acknowledge the limitations of our small sample size, the use of bootstrapping and cross-validation helps to maximize the use of our available data and provides a more robust estimate of model performance. However, we recognize that these results should be interpreted with caution, and future studies with larger sample sizes are needed to confirm our findings.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A