2.6. Biomarker evaluation protocol

Leonardo Gutiérrez-Gómez; Jakub Vohryzek; Benjamin Chiêm; Philipp S. Baumann; Philippe Conus; Kim Do Cuenod; Patric Hagmann; Jean-Charles Delvenne

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.6. Biomarker evaluation protocol

LG Leonardo Gutiérrez-Gómez

JV Jakub Vohryzek

BC Benjamin Chiêm

PB Philipp S. Baumann

PC Philippe Conus

KC Kim Do Cuenod

PH Patric Hagmann

JD Jean-Charles Delvenne

This method is extracted from research article: Neuroimage Clin, Jan 2020

Stable biomarker identification for predicting schizophrenia in the human connectome

DOI: 10.1016/j.nicl.2020.102316

Request a Protocol

Ask a question

Favorite

Our evaluation methodology is based on Abeel et al. 2010 (Abeel et al., 2010) used for biomarker identification in cancer diagnosis on microarray data. In order to assess the robustness of the biomarker selection process, we generate slight variations of the dataset and compare the outcome of selected features across these variations. Therefore, for a stable marker selection algorithm, small variations in the training set should not produce important changes in the retrieved set of features.

We perform a nested 5-fold cross-validation (CV) approach. Here, the external CV is used to provide an unbiased estimate of the performance of the method, whereas the inner CV loop is used to fitting, tunning and selecting the optimal parameters of the model. Concretely, we generate 100 subsamplings of the original dataset, shuffling the outer 5-fold CV scheme 20 times. The $80 %$ of the data, i.e., four folds of the outer CV (pink color in Fig. 1), is used as training set within the inner CV, where the best model and features are selected. That is, four folds are used as training set and the held-out fold as validation set to tune the parameters of the model. The model achieving the best performance on the validation set is selected together with the features selected by the RFE-SVM method. The remaining $20 %$ of the outer CV, i.e., the hold-out fold, is used as testing set to provide an unbiased evaluation of the final model and assess the performance of the classifier. Therefore, the overall accuracy is given by the average testing accuracy across subsamplings. See Fig. 1 for a schematic view of the methodology.

Overview of the proposed method. The figure represents the nested 5-fold CV subsampling of the entire dataset, i.e., top-left gray bar. (Left) The outer CV is used to evaluate the performance of the model. The 80% of the data, i.e., four folds (pink box), is used as training set, where the best model and features are selected. The remaining 20% is used as testing set, to evaluate the performance of the model. (Right) Within the inner CV, four folds are used for training and the hold-out fold as validation set. The best model, features and parameters are selected according with the best CV accuracy. The outer CV is shuffled 20 times, generating 100 subsamplings of the dataset and therefore the same number of selected features ‘fingerprints’. The stability of selected biomarkers and the final accuracy is assessed over all subsampling estimations.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol