The two vectors (at-1 and t, respectively) enter the LSTM unit from the bottom-left corner, and are copied to each gate (ΓF and ΓU) upon their arrival. Then, they are each multiplied with the weight matrix of the respective gate, before a sigmoid is applied to their dot products, and a bias term. As we know, the sigmoid is famous for compressing its input between the range of zero and one, so each gate holds a value between this range. Importantly, each weight matrix is unique to a given gate (Wf for the forget gate, or Wu for the update gate). The weight matrices (Wf and Wu) represent a subset of the learnable parameters within an LSTM unit, and are updated iteratively during the backpropagation procedure, just as we have been doing all along.




















































