The basic DCNN consisted of convolution layer, pooling layer and fully connected layer. Convolution layer could extract features by filters. In order to avoid the overfitting by the increase of features after several filters, pooling layer was proposed to subsample and to reduce the number of parameters. Besides, this layer could retain the relative invariance of space. After obtaining all the features, the fully connection layer was used to integrate them to complete the classification task. In addition to the basic structure of DCNN, we added nonlinear activation functions, batch normalization, dropout layer and softmax function to optimize the network. Rectified linear unit (ReLU) was applied as nonlinear activation function after every convolution layer and every fully connected layer, which could speed up the convergence by sparse activation. Between the convolution layer and the activation function, batch normalization was set up to normalize the distribution of each batch, which could effectively reduce the phenomenon of gradient disappearance [23]. Dropout layer dropped a part of parameters randomly with probability 1-p, which could prevent overfitting by simplify the network (p=0.5). Softmax function, following fully connected layer, could convert the output to a probability distribution. Our proposed DCNNs inherited parts of Visual Geometry Group (VGG) network and Residual Neural Network (ResNet), structures of which are shown in Fig. 2. Network A inherited VGG network, with small filters (3× 3) to reduce the complexity. Network A had five convolutional modules and four full connection layers. Each convolutional module contained two convolutional layers and one max pooling layer. On the basis of network A, network B inherited ResNet, with eight ResBlock modules to increase the depth and to reduce the occurrence of vanishing gradient and exploding gradient. Each ResBlock contained two convolutional layers and one residual path. During the training process, Adam optimizer, learning rate decay, weight decay and momentum optimization were applied to accelerate the convergence and to improve training speed [24]. Cross-entropy was selected as the loss function. Detailed adjustments of hyperparameters are shown in Table 1. The entire process was carried out on the machine MECHREVO MR LX980 equipped with RTX2080, using the Google’s TensorFlow2.0 as the backend.

Sketch maps of the networks. Network A, inherited from VGG-16, consisted of five convolution modules and four full connection layers. Each module contained two convolution layers (conv) and one max pooling layer (pool). On the basis of network A, network B added eight ResBlock modules with two convolution and one residual path (green line)

Detailed adjustments of hyperparameters

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.