Improving LSTMs
Having a model backed up by solid foundations does not always guarantee pragmatic success when used in the real world. Natural language is quite complex. Sometimes seasoned writers struggle to produce quality content. So we can’t expect LSTMs to magically output meaningful, well-written content all of a sudden. Having a sophisticated design—allowing for better modeling of long-term dependencies in the data—does help, but we need more techniques during inference to produce better text. Therefore, numerous extensions have been developed to help LSTMs perform better at the prediction stage. Here we will discuss several such improvements: greedy sampling, beam search, using word vectors instead of a one-hot-encoded representation of words, and using bidirectional LSTMs. It is important to note that these optimization techniques are not specific to LSTMs; rather, any sequential model can benefit from them.
Greedy sampling
If we try to always...