Chapter 4: Training Models with Text Data
In Chapter 3, Training Models with Tabular Data, you went through a series of recipes that demonstrated how to use the facilities of fastai to train deep learning models on tabular data. In this chapter, we will examine how to take advantage of the fastai framework to train deep learning models on text datasets.
To explore deep learning with text data in fastai, we will start by taking a pre-trained language model (that is, a model that, when given a phrase, predicts what words come next) and fine-tuning it with the IMDb curated dataset. We will then use the resulting fine-tuned language model to create a text classifier model for the movie review use case represented by the IMDb dataset. The text classifier predicts the class of a phrase; in the movie review use case, it predicts whether a given phrase is positive or negative.
Finally, we apply the same approach to a standalone (that is, non-curated) text dataset of Covid-related tweets...