The UCSD-AI4H dataset includes 349 COVID-19 CTs and 397 non-COVID-19 CTs, and Fig. 1 shows some of its examples. The CT images were resized to 224 × 224. Then, they were divided into training, validation, and test sets by patient ID. Table 4 shows the statistics for these three subdatasets.
UCSD-AI4H dataset split.
The Italiancase dataset consists of 338 COVID-19 CTs and 397 non-COVID-19 CTs. Fig. 2 shows some of its examples. The size of the CT images was adjusted to 224 × 224. Table 5 shows the statistics for the three subdatasets, including the training, validation, and test sets.
Four examples of Italiancase CT images that are positive for COVID-19.
Italiancase dataset split.
In Fig. 3 the histograms of the pixel intensities of all of the CT scan images in the two datasets after normalization are depicted.
Histogram plots of the normalized pixel intensities for the images.
An important problem with training neural networks on small datasets is that the trained models do not perform well on the validation and test datasets. In order to solve the overfitting problem of these models, a variety of methods have been produced, the simplest of which is to add regularization terms to the weighting paradigm [30]. Another popular technique is dropout, which is achieved by probabilistically removing neurons from a given layer during training or by discarding certain connections [31]. Data augmentation is another way to reduce the overfitting of models. Currently, a widespread and well-accepted practice of image data augmentation is geometric and color augmentation [32], such as reflecting the image, cropping and translating the image, changing the color palette of the image, color processing, and geometrical transformations (rotation, resizing, and so on.). Image augmentation algorithms [33] include geometric transformations, color space augmentations, kernel filters, random erasing, adversarial training, and meta-learning [33]. Among them, the basic methods of image processing data augmentation are geometric transformations, flipping, color space, cropping, rotation, and color space transformations.
In [32], the dataset from tiny-imagenet-200 was used in one experiment to select pictures of dogs and cats in a binary classification task. The result shows that without any data augmentation, the accuracy was 85.5% on the validation set. After using traditional data augmentation methods, the accuracy was improved to 89%, which indicates that traditional data augmentation has some limited effect on improving the accuracy. The image augmentation methods used in this study are all basic methods. The input images were standardized to have zero mean and unit standard deviation. Then, they were cropped to 224 × 224 × 3. For the UCSD-AI4H dataset and Italiancase dataset, the data augmentation methods and values used for each image are shown in Table 6 .
Data augmentation.
Fig. 4 shows several examples after data augmentation, including changing the brightness and contrasting the image or rotating it.
Example transformations after data augmentation.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.