2.2 CNN-based supervised subtomogram classification

MX Min Xu
XC Xiaoqi Chai
HM Hariank Muthakana
XL Xiaodan Liang
GY Ge Yang
TZ Tzviya Zeev-Ben-Mordehai
EX Eric P Xing
request Request a Protocol
ask Ask a question
Favorite

When using CNN for subtomogram classification, the input of the CNN is a 3D subtomogram f, which is a 3D cubic image defined as a function f:ℝ3 → ℝ. The output of the CNN is a vector o: = (o1, …, oL), indicating the probability that f is predicted to be each of the L classes defined in the training data. Each class correspond to one macromolecular complex. Given o, the predicted class is argmaxioi.

In this article, we propose two 3D CNN models based on GoogleNet and VGGNet for supervised subtomogram classification and adapt them for structural feature extraction.

In this section, we propose a 3D variant of tailored inception network (Szegedy et al., 2016a), denoted as Inception3D. Inception network is a recent successful CNN architecture that has the ability to achieve competitive performance with relatively low computational cost (Szegedy et al., 2016a). The architecture of our model is shown in Figure 1a. It contains one inception module (Szegedy et al., 2016a), where 1 × 1 × 1, 3 × 3 × 3, and 5 × 5 × 5 3D filters are combined with 2 × 2 × 2 3D max pooling layer. The filters are implemented in parallel and concatenated, so that the features extracted at multiple scales using filters of different sizes are simultaneously presented to the following layer. The 1 × 1 × 1 filters before the 3 × 3 × 3 and 5 × 5 × 5 convolutions are designed for dimension reduction. The inception module is followed by a 2 × 2 × 2 average pooling layer, then by a fully connected output layer with the number of units equal to the structure class number. All hidden layers are equipped with the rectified linear (ReLU) activation. The output is a fully connected layer followed by a softmax activation layer.

Architectures of our CNN models. These networks both stack multiple layers. Each box represents a layer in the network. The type and configuration of layer are listed in each box. For example, ‘32-5 × 5 × 5-1 Conv’ denotes a 3D convolutional layer with 32 5 × 5 × 5 filters and stride 1. ‘2 × 2 × 2-2 MaxPool’ denotes a 3D max pooling layer implementing max operation over 2 × 2 × 2 regions with stride 2. ‘FC-512’ and ‘FC-L’ denote a fully connected linear layer with 512 and L neurons respectively, where every neuron is connected to every output of the previous layer. L is the number of classes in the training dataset. ‘ReLU’ and ‘Softmax’ denote different types of activation layers

In this section, we propose a 3D variant of tailored VGGNet (Simonyan and Zisserman, 2014), which is another CNN architecture that achieved top classification accuracy on popular image benchmark datasets. Our model is denoted as Deep Small Receptive Field (aka DSRF3D). The architecture of our model is shown in Figure 1b. When compared with the Inception3D model, DSRF3D is featured with deeper layers and very small 3D convolution filters of size 3 × 3 × 3. The stacking of multiple small filters has the same effect of one large filter, with the advantages of less parameters to train, and more non-linearity (Simonyan and Zisserman, 2014). The architecture consists of four 3 × 3 × 3 3D convolutional layers and two 2 × 2 × 2 3D max pooling layers, followed by two fully connected layers, then followed by a fully connected output layer with the number of units equal to the structure class number. All hidden layers are equipped with the ReLU activation layers. The output is a fully connected layer with a softmax activation layer.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A