The VGG16 network was proposed by Oxford Visual Geometry Group of Oxford University in 2014. Compared with the previous image classification network (AlexNet), VGG1623 uses three convolution kernels instead of convolution kernels, and two convolution kernels replace the convolution kernel, which ensures that the depth of the network is improved under the condition of the same perceptual field, and thus the model classification ability is greatly improved. ResNet5024 is based on previous network and introduces residual learning. By adding residual units, it solves the problem of gradient explosion and gradient disappearance caused by the increase of network depth, which can more effectively increase network depth and improve model classification capabilities. InceptionV325 uses a multiscale parallel separation and fusion method of visual information to decompose each convolutional layer into two one-dimensional convolutional layers, which can speed up the calculation, increase the depth of the network, and improve the nonlinear expression ability of the network, which effectively avoids overfitting. Xception is an extended evolution of InceptionV3. It replaces the original convolution with depthwise separable convolution and divides the ordinary convolution into depthwise convolution and pointwise convolution.26 When we maintain the accuracy of the original classification, the amount of model parameters is greatly reduced, and the running speed is improved.
To further improve the accuracy and speed of x-ray images detection and classification, this paper proposes SVRNet and SVDNet based on the above-mentioned network comparison and analysis. SVRNet [as shown in Fig. 4(a)] replaces the original convolution with separable convolution on the basis of VGG16 and then introduces the residual unit to construct a convolutional neural network, thereby reducing the amount of model parameters while increasing the network depth and improving accuracy. SVDNet [as shown in Fig. 4(b)] adds separable convolution to VGG16 as well as the layer-jumping connection structure of DenseNet. DenseNet27 can connect each layer in the network with the previous layer to enhance feature utilization, reorganize all feature maps at the end to achieve maximum utilization of resources and compression of calculations, and increase the speed as much as possible while improving model classification capabilities.
Network structure diagram. (a) SVRNet structure diagram and (b) SVDNet structure diagram.
The experiment in this article is under the Windows 10 operating system, the GPU is NVIDIA RTX2060, using TensorFlow2.0 and Keras2.3.1 deep learning framework, Python3.7.4 language to build deep convolutional neural networks corresponding to different models. The specific settings of hyperparameters in the network are shown in Table 2.
Specific settings of hyperparameters.
The proposed SVRNet model and SVDNet model were trained on the training set and validation set by the adam optimizer using a learning rate policy where the learning rate decreases when learning stagnates for a period of time. The following hyperparameters were used for training: learning rate = 1e−3, number of epochs = 100, batch size = 8. The loss function and activation function of the network are categorical crossentropy and relu, respectively. Finally, softmax is selected as the classifier to classify images that are positive and negative for COVID-19 detection. At the same time, to better compare the model performance, VGG16, ResNet50, InceptionV3, and Xception also conducted the same training.
After model training experiments, this article has obtained six models (VGG16, ResNet50, InceptionV3, Xception, SVRNet, and SVDNet) with good detection capabilities for COVID-19 in x-ray images. To accurately evaluate the comprehensive performance of each model, we let all models perform classification tasks on the 312 lung x-rays in the test set and detect the images that are positive and negative for COVID-19. Finally, this paper introduces evaluation metrics for medical image classification (including accuracy, sensitivity, specificity, precision, recall, -score), confusion matrix, and a graph showing how the accuracy varies with the number of training epochs. Calculate the value of each classification metric and make a specific assessment of the detection and classification ability of each model. Calculated as follows:
From Eqs. (5) and (6), -score can be derived as
Among them, TP, FP, TN, and FN represent true positive (that is, the number of positive samples that are classified into the positive sample class), false positive (that is, the number of negative samples that are classified into the positive sample class), true negative (that is, the number of negative samples classified into the negative sample class), and false negative (that is, the number of positive samples classified into the negative sample class) in the confusion matrix. As showed in Table 3, substituting the TP, FP, TN, and FN in the confusion matrix into the calculation, we can get the values of accuracy, sensitivity, specificity, precision, recall, -score, and then get the comprehensive performance evaluation of the model.
Parameter meaning correspondence table.
In addition, to compare the amount of parameters used when different models are running, we have added a parameter statistical function (model.summary()) to the program record the corresponding parameter amount of each model. The parameter change rate is calculated as follows:
Among them, represents the parameter of any model , represents the parameter of any model , represents the difference between the number of parameters in model and model , and is the parameter change rate of model relative to model .
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.