Data should be randomly partitioned into training and test datasets with no overlapping among these two subsets. The models are trained on the training dataset and finally are evaluated by applying them to the test datasets.

K-fold cross validation (C.V.) is a common and popular sampling strategy used for this purpose. In this method, data is randomly divided into K disjoint equal-size subsets. Every time, one of these K subsets is considered as the test dataset and all (K-1) remaining subsets make the training one. The model is trained K times on K training datasets and applied to the corresponding test datasets to evaluate the performances of the trained models.

Before sampling from data, the features having missing value rate higher than 20% are removed from the study. Moreover, the patient records with high missing value rate (higher than 20%) are excluded from the study and then, fivefold C.V. is used for sampling from the collected dataset, in this study.

At first, dataset is partitioned into non-overlapping subsets D1, D2, …, DK based on K-fold Cross Validation strategy. Then, the models are trained on K training datasets composed of all D1, …, DK subsets excluding Di for 1 ≤ i ≤ K. Therefore, the ith training dataset consists of all D1, …, DK but Di and the ith test dataset is Di. The ith training dataset is balanced using over-sampling strategy.

Moreover, a strategy for classification structural risk assessment is used named as A-Test which will be described in the evaluation and validation subsection with more details. The number of instances of positive and negative outcomes in each folder of fivefold is 324–325 and 1926–1927, respectively. therefore, the imbalance ratio of the training set in each of 5-folds is about 0.168.

Note: The content above has been extracted from a research article, so it may not display correctly.



Q&A
Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.



We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.