NLP 2.0: Using Transformers to Generate Text
As we saw in the previous chapter, the NLP domain has seen some remarkable leaps in the way we understand, represent, and process textual data. From handling long-range dependencies/sequences using LSTMs and GRUs to building dense vector representations using word2vec and friends, the field in general has seen drastic improvements. With word embeddings becoming almost the de facto representation method and LSTMs as the workhorse for NLP tasks, we were hitting some roadblocks in terms of further enhancement. This setup of using embeddings with LSTM made the best use of encoder-decoder (and related architectures) style models.
We saw briefly in the previous chapter how certain improvements were achieved due to the research and application of CNN-based architectures for NLP use cases. In this chapter, we will touch upon the next set of enhancements that led to the development of current state-of-the-art transformer architectures...