Comparison of network/power features using classification of node-pairs

The DCS data labels: PN positive or language negative for node-pairs were considered as ground truth. Using standard machine-learning classifiers and a training and testing paradigm, the accuracy of classification of node-pairs were compared using various power and network feature spaces, namely, (1) hγ power of the node-pairs, (2) in-degrees, (3) out-degrees, (4) both in-out degrees (5) coreness of nodes; and using a combined network and power feature space: (6) in-degrees and power features, (7) out-degrees and power, (8) combined in/out degrees and power feature space, and (9) combined coreness of nodes and power. Multiple classifiers were used to eliminate bias in the results because of a particular classifier. A comparison of classification accuracy across these nine feature spaces would reveal insight into which feature space had higher discriminability to classify between the PN positive and language negative node-pairs. The total number of labeled node-pairs (n), and the length of each feature space (p) together form an n × p matrix, (n < p), which is used for classification. For example, the power feature space was created by taking the power time series (from 52 windows) for each node of the node-pair, concatenating them to produce a vector of length 104 for that node-pair. Every DCS node-pair provided two data samples for classification, as features from node-pairs could be concatenated in two ways. Specifically, each DCS node-pair was used to generated another labeled node-pair by reversing the order of the nodes in which the features were concatenated, thus doubling our available labeled node-pairs. Thus, in this work, the effective n for classification varied between 26 and 46 for the patients, while p varied between 104 and 312, based on the feature space considered. The number of labeled node-pairs were sufficient to estimate classification accuracies with a 95% confidence interval. For each feature space, 5-fold cross validation was performed, with repeated random splits of the data (Varoquaux et al., 2017), keeping the training and test sets stratified, to have a balanced split among the two classes. The results are averaged over the test sets. Remark: Our previous efforts with using the original DCS node-pairs had insufficient samples for 5-fold cross validation and confidence interval estimation. Classification accuracies found using the leave-one-out cross-validation methodology were similar to the results and trends among feature spaces presented in this paper, but did not have additional statistics provided here.

A brief note on notation: TP(FP) stands for true(false) positives, TN(FN) for true(false) negatives. True positive rate (TPR), also known as sensitivity or recall is given by TPR=TPTP+FN. True negative rate (TNR) is also called specificity or selectivity is TNR=TNTN+FP. Precision is given by TPTP+FP. For all patients, the number of PN positive and language negative node-pairs were not equal, so the balanced accuracy metric was used, by normalizing true positive and true negative predictions. Balanced accuracy =TPR+TNR2.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.