First, let's detail how the gates are computed:
As detailed in the previous equations, the three gates are computed using the same principle—by multiplying a weight matrix (W) by the previous output (h<t-1>) and the current input (x<t>). Notice that the activation function is the sigmoid (σ). As a consequence, the gate values are always between 0 and 1.
The candidate state () is computed in a similar fashion. However, the activation function used is a hyperbolic tangent instead of the sigmoid:
Notice that this formula is exactly the same as the one used to compute h<t> in the basic RNN architecture. However, h<t> was the hidden state while, in this case, we are computing the candidate cell state. To compute the new cell state, we combine the previous one with the candidate cell state. Both states are gated by the forget and input gates, respectively:
Finally, the LSTM hidden state (output) will be computed from the cell...