Improving LSTMs – beam search
As we saw earlier, the generated text can be improved. Now let's see if beam search, which we discussed in Chapter 7, Long Short-Term Memory Networks, might help to improve the performance. In beam search, we will look ahead a number of steps (called a beam) and get the beam (that is, a sequence of bigrams) that has the highest joint probability calculated separately for each beam. The joint probability is calculated by multiplying the prediction probabilities of each predicted bigram in a beam. Note that this is a greedy search, meaning that we will calculate the best candidates at each depth of the tree iteratively, as the tree grows. It should be noted that this search will not result in the globally best beam.
Implementing beam search
To implement beam search, we only have to change the text generation technique. Training and validation operations stay the same. However the code will be more complicated than the text generation operation flow we saw earlier...