A custom module was built to classify samples using two layers of models: (i) three linear kernel support vector machine (SVM) models: a malignant versus healthy model (MH model), a malignant versus benign model (MB model), and a benign versus healthy model (BH model). Each model searches for a hyperplane with maximal distances from both two pre-defined training classes. Like all linear classifiers, the decision function is presented as , where w = [ w1, w2, …, wk ]T is the weight vector and b represents the distance of the hyperplane from the origin. (ii) A multinomial logistic regression model: for each sample, the output from the MH, MB, and BH models was fed into a multinomial logistic regression model to obtain a cancer/benign/healthy assignment as a final prediction. Both layers were trained by the stochastic gradient descent (SGD) algorithm, and the performance of the training set was assessed by iterated 5-fold cross-validation. During the independent validation phase, the model with locked parameters was applied directly to the blind samples and the clinical information was not released until all analyses were completed.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.