Updates and Gradient Flow
The updates can be listed as follows:
- Adjusting weight matrix Wy
- Adjusting weight matrix Ws
- For updating Wx
Adjusting Weight Matrix Wy
The model can be visualized as follows:
Figure 5.18: Back propagation of loss through weight matrix Wy
For Wy, the update is very simple since there are no additional paths or variables between Wy and the error. The matrix can be realized as follows:
Figure 5.19: Expression for weight matrix Wy
Adjusting Weight Matrix Ws
Figure 5.20: Back propagation of loss through weight matrix Ws with respect to S3
We can calculate the partial derivate of error with respect to Ws using the chain rule, as shown in the previous figure. It looks like that is what is needed, but it's important to remember that St is dependent on St-1, and therefore S3 is dependent on S2, so we need to consider S2 also, as shown here: