Generating text – one character at a time
Text generation yields a window into whether deep learning models are learning about the underlying structure of language. Text will be generated using two different approaches in this chapter. The first approach is an RNN-based model that generates a character at a time.
In the previous chapters, we have seen different tokenization methods based on words and sub-words. Text is tokenized into characters, which include capital and small letters, punctuation symbols, and digits. There are 96 tokens in total. This tokenization is an extreme example to test how much a model can learn about the language structure. The model will be trained to predict the next character based on a given set of input characters. If there is indeed an underlying structure in the language, the model should pick it up and generate reasonable-looking sentences.
Generating coherent sentences one character at a time is a very challenging task. The&...