Our third aim was to construct three classification algorithms using a machine learning approach for the differentiation between (1) Alzheimer's disease patients and controls; (2) bvFTD patients and controls; (3) Alzheimer's disease and bvFTD patients. Due to its small size, the svPPA sample was not considered suitable for this analysis. Only the data from the discovery sample was used for generating the classification algorithms.
The data set used in any machine learning classifier must be carefully prepared. Firstly, a normalization was performed to avoid dispersion in data with different dimensions. The normalization allows the different dimensions of the data to be scaled to standardize the range of the characteristics, since it can affect the results in a critical way (Graf and Borer, 2001).
Where x is the value of a feature, while and σ are the mean value and the standard deviation of the feature set, respectively.
In our study, the number of subjects was less than the number of features, so the partial less square regression (PLSR or PLS) technique was applied to reduce the number of significance variables.
Let's X ∈ ℝn the set of independent features and Y ∈ ℝn the set of dependent features. The relation between each set is given by a score vector. We compute the score vector using the partial minimum square regression (PLSC) (Krishnan et al., 2011). Then, the feature sets are defined by:
Where T ∈ ℝn×p, U ∈ ℝn×p while P ∈ ℝN×p and Q ∈ ℝM×p are the weight matrix and E ∈ ℝn×N F ∈ ℝn×M are the residual matrices. The PLS2 algorithm was used to compute each matrix.
Finally, the Fisher discriminant ratio (FDR) was used to select the features to train the machine learning algorithm. One of the main advantages of FDR is that we can associate the sets of features with a label such as: Alzheimer's disease, bvFTD or controls. The FDR is defined by the μi: and σi are the mean and variance of the set i.
Under Matlab environment, we performed multiple tests in order to find the most suitable combination between oculomotor features and type of machine learning algorithm for each pair of groups (Alzheimer's disease patients vs. controls; bvFTD vs. controls; Alzheimer's disease vs. bvFTD), as it is shown in Figure 1. The most suitable combination was defined by the biggest area under the curve (AUC) in the receiver operating characteristic (ROC) curve. The classifiers were modeled through a cross-validation process in which the sample was divided in four subsets (80% of the sample) for training and one subset (20% of the sample) for classification.
Flow chart to find the most suitable combination between eye movement features and machine learning algorithm. AUC, area under the curve; FDR, Fisher discriminant ratio; ML, machine learning; PLSR, partial less square regression.
Despite the cross-validation performed by Matlab, we decided to implement the selected algorithm under Microsoft Visual Studio C++ language in order to get a software independent from Matlab and to carry out the cross-validation of the algorithm with the whole set of samples instead of a subset like in the previous case and through a loop of 1,000 iterations. Moreover, under C++ environment, the confidence interval of the algorithm was computed using the following flow chart displayed in Figure 2.
Loop implemented to test the confidence interval of the selected machine learning algorithm. AUC, area under the curve; I, iterations; ML, machine learning; ROC, receiver operating characteristic.
Finally, we were interested in applying the classification algorithms for the differentiation between controls and each dementia group (Alzheimer's disease patients vs. controls and bvFTD vs. controls) in independent samples of patients with the aim of testing their external validity. For doing so, the previously generated algorithms were applied to the independent samples of Alzheimer's disease and bvFTD patients and their classification accuracy was assessed using ROC curves.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.