3.3. Multi-Column Convolutional Neural Network

AF Ao Feng
HL Hongxiang Li
ZL Zixi Liu
YL Yuanjiang Luo
HP Haibo Pu
BL Bin Lin
TL Tao Liu
request Request a Protocol
ask Ask a question
Favorite

The Multi-Column Convolutional Neural Network (MCNN) is a convolutional neural network model proposed on the basis of Multi-Column Deep Neural Networks (MDNNs), and we will use it to learn the target density map [26]. Due to the different shapes and sizes of the experimental rice itself and the image perspective distortion, the images contain rice of different sizes, so it is difficult for filters with receptive fields of the same size to capture the rice density features at different scales. It is more natural to use filters with local receivers of different sizes to learn the mapping from the original pixels to the density map. In the MCNN used in this paper, for each column, filters of different sizes are used to model the density maps corresponding to different scales of rice. For example, filters with larger receptive fields are more useful for modeling density maps corresponding to larger rice. The MCNN structure used in this paper is shown in Figure 4.

MCNN base structure, containing three parallel CNNs with filters with different sizes of local receptive fields.

To reduce the computation time, we use the same Conv-Pooling-Conv-Pooling structure for all columns, which contains four convolutional processes, but the size and number of filters vary. The three side-by-side CNNs extract large, medium, and small features from top to bottom, which are called the L, M, and S rows, respectively. The first Convolution Layer in Row L uses a 9 × 9 filter with 16 channels; the second Convolution Layer is 7 × 7 with 32 channels; the third Convolution Layer is also 7 × 7 with 16 channels, and the last Convolution Layer is also 7 × 7. The first Convolution Layer in Row M uses a 7 × 7 filter with 20 channels. The first Convolution Layer in Row S uses a 5 × 5 filter with a channel of 24. The last three revolution layers use 3 × 3 filter with channels of 48, 24 and 12 respectively. All three CNNs use maximum pooling in the pooling process and use 2 × 2 size regions with rectified linear units (ReLUs) as the activation function because of its good performance for CNNs. To reduce the computational complexity (the number of parameters to be optimized), a smaller number of filters is used in this paper for CNNs with larger filters [27]. Experimentally, the feature maps of all convolutional neural network outputs are overlapped and mapped onto the density map. To map the feature maps to the density map, a filter of size 1 × 1 is used in this paper. It is important to note that traditional CNNs usually perform a step of image preprocessing, that is, images of different sizes are planned to the same size by stretching or cropping [28]. In this paper, the original size of the input image is chosen because resizing the image to the same size would introduce, in the density map, additional distortion [29]. In addition to the fact that the filters in the CNNs in this paper have different sizes, the remaining difference between the MCNN used in this paper and the normal MDNN is that the MCNN weights the outputs of the CNNs with different columns with the network learnable weights. By contrast, in the previously proposed MDNNs, the outputs are simply averaged.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A