Next, we will look at the equation that leverages the activation value that we just calculated to produce a prediction ( at the given time step (t). This is represented like so:
= g [ (Way x at) + by ]
This tells us is that our layer's prediction at a time step is determined by computing a dot product of yet another temporally shared output matrix of weights, along with the activation output (at) we just computed using the earlier equation.
Due to the sharing of the weight parameters, information from previous time steps is preserved and passed through the recurrent layer to inform the current prediction. For example, the prediction at time step three leverages information from the previous time steps, as shown by the green arrow here:
To formalize these computations, we mathematically show the relation between the predicted output at...