Gated recurrent unit

Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al.[1] The GRU is like a long short-term memory (LSTM) with a forget gate,[2] but has fewer parameters than LSTM, as it lacks an output gate.[3] GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM.[4][5] GRUs have been shown to exhibit better performance on certain smaller and less frequent datasets.[6][7]

Architecture

There are several variations on the full gated unit, with gating done using the previous hidden state and the bias in various combinations, and a simplified form called minimal gated unit.[8]

The operator $\odot$ denotes the Hadamard product in the following.

Fully gated unit

Gated Recurrent Unit, fully gated version

Initially, for $t=0$ , the output vector is $h_{0}=0$ .

{\begin{aligned}z_{t}&=\sigma _{g}(W_{z}x_{t}+U_{z}h_{t-1}+b_{z})\\r_{t}&=\sigma _{g}(W_{r}x_{t}+U_{r}h_{t-1}+b_{r})\\{\hat {h}}_{t}&=\phi _{h}(W_{h}x_{t}+U_{h}(r_{t}\odot h_{t-1})+b_{h})\\h_{t}&=(1-z_{t})\odot h_{t-1}+z_{t}\odot {\hat {h}}_{t}\end{aligned}}

Variables

$x_{t}$ : input vector
$h_{t}$ : output vector
${\hat {h}}_{t}$ : candidate activation vector
$z_{t}$ : update gate vector
$r_{t}$ : reset gate vector
$W$ , $U$ and $b$ : parameter matrices and vector

Activation functions

$\sigma _{g}$ : The original is a sigmoid function.
$\phi _{h}$ : The original is a hyperbolic tangent.

Alternative activation functions are possible, provided that $\sigma _{g}(x)\in [0,1]$ .

Type 1

Type 2

Type 3

Alternate forms can be created by changing $z_{t}$ and $r_{t}$ [9]

Type 1, each gate depends only on the previous hidden state and the bias.
${\begin{aligned}z_{t}&=\sigma _{g}(U_{z}h_{t-1}+b_{z})\\r_{t}&=\sigma _{g}(U_{r}h_{t-1}+b_{r})\\\end{aligned}}$
Type 2, each gate depends only on the previous hidden state.
${\begin{aligned}z_{t}&=\sigma _{g}(U_{z}h_{t-1})\\r_{t}&=\sigma _{g}(U_{r}h_{t-1})\\\end{aligned}}$
Type 3, each gate is computed using only the bias.
${\begin{aligned}z_{t}&=\sigma _{g}(b_{z})\\r_{t}&=\sigma _{g}(b_{r})\\\end{aligned}}$

Minimal gated unit

The minimal gated unit is similar to the fully gated unit, except the update and reset gate vector is merged into a forget gate. This also implies that the equation for the output vector must be changed:[10]

{\begin{aligned}f_{t}&=\sigma _{g}(W_{f}x_{t}+U_{f}h_{t-1}+b_{f})\\{\hat {h}}_{t}&=\phi _{h}(W_{h}x_{t}+U_{h}(f_{t}\odot h_{t-1})+b_{h})\\h_{t}&=(1-f_{t})\odot h_{t-1}+f_{t}\odot {\hat {h}}_{t}\end{aligned}}

Variables

$x_{t}$ : input vector
$h_{t}$ : output vector
${\hat {h}}_{t}$ : candidate activation vector
$f_{t}$ : forget vector
$W$ , $U$ and $b$ : parameter matrices and vector

References

Cho, Kyunghyun; van Merrienboer, Bart; Gulcehre, Caglar; Bahdanau, Dzmitry; Bougares, Fethi; Schwenk, Holger; Bengio, Yoshua (2014). "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation". arXiv:1406.1078. Cite journal requires |journal= (help)
Felix Gers; Jürgen Schmidhuber; Fred Cummins (1999). "Learning to Forget: Continual Prediction with LSTM". Proc. ICANN'99, IEE, London. 1999: 850–855. doi:10.1049/cp:19991218. ISBN 0-85296-721-7.
"Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML". Wildml.com. 2015-10-27. Retrieved May 18, 2016.
Ravanelli, Mirco; Brakel, Philemon; Omologo, Maurizio; Bengio, Yoshua (2018). "Light Gated Recurrent Units for Speech Recognition". IEEE Transactions on Emerging Topics in Computational Intelligence. 2 (2): 92–102. arXiv:1803.10225. doi:10.1109/TETCI.2017.2762739. S2CID 4402991.
Su, Yuahang; Kuo, Jay (2019). "On Extended Long Short-term Memory and Dependent Bidirectional Recurrent Neural Network". arXiv:1803.01686.
Su, Yuanhang; Kuo, Jay (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].
Gruber, N.; Jockisch, A. (2020), "Are GRU cells more specific and LSTM cells more sensitive in motive classification of text?", Frontiers in Artificial Intelligence, 3, doi:10.3389/frai.2020.00040, S2CID 220252321
Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].
Dey, Rahul; Salem, Fathi M. (2017-01-20). "Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks". arXiv:1701.05923 [cs.NE].
Heck, Joel; Salem, Fathi M. (2017-01-12). "Simplified Minimal Gated Unit Variations for Recurrent Neural Networks". arXiv:1701.03452 [cs.NE].

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Cho, Kyunghyun; van Merrienboer, Bart; Gulcehre, Caglar; Bahdanau, Dzmitry; Bougares, Fethi; Schwenk, Holger; Bengio, Yoshua (2014). "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation". arXiv:1406.1078. Cite journal requires |journal= (help)

[lstm1999-2] Felix Gers; Jürgen Schmidhuber; Fred Cummins (1999). "Learning to Forget: Continual Prediction with LSTM". Proc. ICANN'99, IEE, London. 1999: 850–855. doi:10.1049/cp:19991218. ISBN 0-85296-721-7.

[MyUser_Wildml.com_May_18_2016c-3] "Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML". Wildml.com. 2015-10-27. Retrieved May 18, 2016.

[Ravalli2018-4] Ravanelli, Mirco; Brakel, Philemon; Omologo, Maurizio; Bengio, Yoshua (2018). "Light Gated Recurrent Units for Speech Recognition". IEEE Transactions on Emerging Topics in Computational Intelligence. 2 (2): 92–102. arXiv:1803.10225. doi:10.1109/TETCI.2017.2762739. S2CID 4402991.

[Su2019-5] Su, Yuahang; Kuo, Jay (2019). "On Extended Long Short-term Memory and Dependent Bidirectional Recurrent Neural Network". arXiv:1803.01686.

[MyUser_Arxiv.org_May_18_2016c-6] Su, Yuanhang; Kuo, Jay (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].

[gruber_jockisch-7] Gruber, N.; Jockisch, A. (2020), "Are GRU cells more specific and LSTM cells more sensitive in motive classification of text?", Frontiers in Artificial Intelligence, 3, doi:10.3389/frai.2020.00040, S2CID 220252321

[Chung_18_2016c-8] Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].

[9] Dey, Rahul; Salem, Fathi M. (2017-01-20). "Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks". arXiv:1701.05923 [cs.NE].

[10] Heck, Joel; Salem, Fathi M. (2017-01-12). "Simplified Minimal Gated Unit Variations for Recurrent Neural Networks". arXiv:1701.03452 [cs.NE].