In the previous section, we learned that we could increase the accuracy of translation by enabling the teacher forcing technique, where the actual word in the previous time step of target was used as an input to the model.
In this section, we will extend this idea further and assign weightage to the input encoder based on how similar the encoder and decoder vectors are at each time step. This way, we are enabling that certain words have a higher weightage in the encoder's hidden vector, depending on the time step of the decoder.