Comparing three text classification methods
One of the most useful things we can do with evaluation techniques is to decide which of several approaches to use in an application. Are the traditional approaches such as term frequency - inverse document frequency (TF-IDF), support vector machines (SVMs), and conditional random fields (CRFs) good enough for our task, or will it be necessary to use deep learning and transformer approaches that have better results at the cost of longer training time?
In this section, we will compare the performance of three approaches on a larger version of the movie review dataset that we looked at in Chapter 9. We will look at using a small BERT model, TF-IDF vectorization with the Naïve Bayes classification, and a larger BERT model.
A small transformer system
We will start by looking at the BERT system that we developed in Chapter 11. We will use the same BERT model as in Chapter 11, which is one of the smallest BERT models, small_bert/bert_en_uncased_L...