2.2. The Principle of GRU

SH Shipeng Han
ZM Zhen Meng
XZ Xingcheng Zhang
YY Yuepeng Yan
request Request a Protocol
ask Ask a question
Favorite

As a variant of LSTM, GRU has been developed and shown to produce correspondingly competent results that were similar to LSTM models. GRUs, proposed by Chung in 2014, only differed from LSTMs in how their gates monitor information flow from erstwhile time steps while the gating mechanisms in LSTMs rather control the flow of information within internal cell unit [48]. GRUs are often preferred for solving problems related to long-term memory and gradient in backpropagation as they can achieve comparable results as LSTM. Further, GRUs are comparably easier to train and provides improved training efficiency. GRU, similar to LSTM, also controls the information flow by “gate”, but with one less gate than LSTM and without cell states. A detailed GRU structure introduced in [52,54] is analyzed as shown in Figure 2, and this is considered for implementation in this study. The input and output structure of GRU is the same as that of a normal RNN. There is a current input Xt, and a hidden state Ht1 passed down from the previous node, which contains the relevant information of the previous node. Combining Xt and Ht1, GRU gets the output Yt of the current hidden node and the hidden state Ht passed to the next node. GRU, similar to LSTM, also controls the information flow by “gate”, but with one less gate than LSTM and also without cell states.

The structure of gated recurrent unit (GRU).

According to the principle of GRU, there are two gates in the GRU structure, namely the reset gate and the update gate, and both gates are determined by the state Ht1 of the previous transmission down and the input Xt of the current node. They can be used the Equations (8) and (9) to show the relationship.

where R stands for the reset gate, Z represents the update gate, and σ() is a sigmoid function that transforms the data into a value in the range of 0–1, thus acting as a gating signal. Wr and Wz is the weight matrix of reset gate and update gate, respectively, while [Ht1,Xt] is an operation that joins two vectors. After obtaining the gating signal, the reset gate is used to get the “reset” data Ht1, then Ht1 is spliced with the input Xt, and finally the data is deflated to [−1,1] by the tanh activation process in Equations (10) and (11).

where is the Hadamard product of contents in the reset gate and content of the hidden node; similarly, W is the corresponding weight matrix. H mainly contains the Xt data of the current input. In addition, adding H to the current hidden state in a targeted way is equivalent to remembering the current state at the moment.

Lastly, the memory state is updated by employing content of the update gate Z to achieve forgetting and selective memory functions. Gating signal Z ranges from 0 to 1 while signal values closer to the gating signal value tend to be remembered, and the ones closer to zero, tend to be “forgotten”. Where (1Z)Ht1 means selective “forgetting” of the original hidden state, ZH indicates selective “memory” of H containing current node information; σ() is the sigmoid function, W is the corresponding weight matrix, and the output Yt is often obtained by changing Ht.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A