Since you have already understood how the LSTM network works, let's take a step back and look at the full network architecture. As we said before, we are using a sequence-to-sequence model with an attention mechanism. This model consists of LSTM units grouped together, forming the encoder and decoder parts of the network.
In a simple sequence-to-sequence model, we input a sentence of a given length and create a vector that captures all the information in that particular sentence. After that, we use the vector to predict the translation. You can read more about how this works in a wonderful Google paper (https://arxiv.org/pdf/1409.3215.pdf) in the External links section at the end of this chapter.
That approach is fine, but, as in every situation, we can and must do better. In that case...