There are two potential logical enhancements to the architecture we defined in the previous section:
- Make use of the information present in the cell state while generating translations
- Make use of the previously translated words as an input in predicting the next word
The second technique is called Teacher Forcing. Essentially, by giving the previous time step's actual value as input while generating the current time step, we are tuning the network faster, and practically more accurately.