2.5.3 Deep learning method

XB Xiulin Bai
YZ Yujie Zhou
XF Xuping Feng
MT Mingzhu Tao
JZ Jinnuo Zhang
SD Shuiguang Deng
BL Binggan Lou
GY Guofeng Yang
QW Qingguan Wu
LY Li Yu
YY Yong Yang
YH Yong He
request Request a Protocol
ask Ask a question
Favorite

A self-built long short-term memory network with an attention mechanism (ATT-LSTM) was used to evaluate the disease severity. The architecture of the designed ATT-LSTM model is shown in Figure 2 .

The architecture of ATT-LSTM model.

The first block was ATT block, which was realized by two dense layers:

where YATT is the output of the ATT block, X denotes the input data, W and b represent the weights and bias of the attention layer, ⊗ indicates matrix multiplication, and frelu is the activation function. The number of neurons of the first dense layer was set to 128, and that of the second dense layer was set as the number of input variables. The attention mechanism helps the model assign different weights to each input variable, which makes the network pay more attention to important information (Li et al., 2020). Thus, the ATT block could also be used to extract the feature variables. Define the value of f relu (W 2f relu (W 1X+b 1)+b 2) as the ATT weight. Variables with larger weight values are more important. In this study, the important wavelengths were screened based on the ATT weight.

The second part was the LSTM block. It contained two LSTM units with 128 and 32 hidden units, respectively. Each LSTM unit was followed by a max pooling layer and a batch normalization layer. LSTM is a special recurrent neural network that stores long-term states by adding a memory neuron. The neuron can decide which states are forgotten or retained, thus solving the recurrent neural network gradient disappearance or gradient explosion (Zhelev and Avresky, 2019). It has advantages for the processing of time-series data. The specific introduction of LSTM can be seen in http://colah.github.io/posts/2015-08-Understanding-LSTMs/.

The LSTM block was followed by a fully connected layer containing 128 neurons, and a dropout layer was used to prevent overfitting. Finally, output category. The output of this study was the disease severity with 6 categories.

The deep learning model was implemented based on the MXNET framework. The Softmax cross-entropy loss function and adaptive moment estimation were applied to train the model. In the training phase, the batch size was set as 20, and a dynamic learning rate was used. In the beginning, a relatively large learning rate of 0.001 was set to speed up the training process for the first 500 epochs, and then it was reduced to 0.0001 for the next 300 epochs.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A