In this section, we are going to consider both our experimental setups: one where the generator and discriminator predict and discriminate sequences of words, and another where the models predict and discriminate sequences on characters. Note that in both cases, there is no difference between the representation of a word or a character; they are just vectors in multidimensional space.
Assuming the same sequence length, the task of predicting a sequence of characters is harder than the task of predicting a sequence of words. First, because in the character case the model has to perform more predictions. Second, because overall entropy or uncertainty when predicting characters is higher than predicting words, as it implies predicting a sequence of characters that form a word and predicting another sequence of characters that forms another word, and that is likely to be...