As a variant of LSTM, GRU has been developed and shown to produce correspondingly competent results that were similar to LSTM models. GRUs, proposed by Chung in 2014, only differed from LSTMs in how their gates monitor information flow from erstwhile time steps while the gating mechanisms in LSTMs rather control the flow of information within internal cell unit [48]. GRUs are often preferred for solving problems related to long-term memory and gradient in backpropagation as they can achieve comparable results as LSTM. Further, GRUs are comparably easier to train and provides improved training efficiency. GRU, similar to LSTM, also controls the information flow by “gate”, but with one less gate than LSTM and without cell states. A detailed GRU structure introduced in [52,54] is analyzed as shown in Figure 2, and this is considered for implementation in this study. The input and output structure of GRU is the same as that of a normal RNN. There is a current input , and a hidden state passed down from the previous node, which contains the relevant information of the previous node. Combining and , GRU gets the output of the current hidden node and the hidden state passed to the next node. GRU, similar to LSTM, also controls the information flow by “gate”, but with one less gate than LSTM and also without cell states.
The structure of gated recurrent unit (GRU).
According to the principle of GRU, there are two gates in the GRU structure, namely the reset gate and the update gate, and both gates are determined by the state of the previous transmission down and the input of the current node. They can be used the Equations (8) and (9) to show the relationship.
where stands for the reset gate, represents the update gate, and is a sigmoid function that transforms the data into a value in the range of 0–1, thus acting as a gating signal. and is the weight matrix of reset gate and update gate, respectively, while is an operation that joins two vectors. After obtaining the gating signal, the reset gate is used to get the “reset” data , then is spliced with the input , and finally the data is deflated to [−1,1] by the tanh activation process in Equations (10) and (11).
where is the Hadamard product of contents in the reset gate and content of the hidden node; similarly, is the corresponding weight matrix. mainly contains the data of the current input. In addition, adding to the current hidden state in a targeted way is equivalent to remembering the current state at the moment.
Lastly, the memory state is updated by employing content of the update gate to achieve forgetting and selective memory functions. Gating signal Z ranges from 0 to 1 while signal values closer to the gating signal value tend to be remembered, and the ones closer to zero, tend to be “forgotten”. Where means selective “forgetting” of the original hidden state, indicates selective “memory” of containing current node information; is the sigmoid function, is the corresponding weight matrix, and the output is often obtained by changing .
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.