We used an unpaired two-sample Wilcoxon rank-sum test to determine any statistically significant (p < 0.05) relationships in graph measures between both patient groups. We corrected for multiple comparisons by using False Discovery Rate Correction with q < 0.05.

During the model training phase, the data was randomly divided into testing and training datasets which may produce slightly different models depending on the division. To address this, the SVM was run 100 times (Figures 1C–E) and the results were averaged to calculate final performance measures. The arithmetic mean of the accuracy, sensitivity, specificity, and AUC of the 100 repetitions was computed for the final analysis.

Statistical significance of the classification accuracy and AUC were tested using permutation testing with 1,000 permutations. For this step, the subject's class (group) was randomly assigned. The resulting accuracy produced a null-hypothesis distribution that was then used to calculate the p-value of the corresponding accuracy (i.e., the fraction of permutations that produced a greater accuracy than the accuracy found for the classification models) (66).

