Summary
It is apparent that deep models perform very well when they have a lot of data. BERT and GPT models have shown the value of pre-training on massive amounts of data. It is still very hard to get good-quality labeled data for use in pretraining or fine-tuning. We used the concepts of weak supervision combined with generative models to cheaply label data. With relatively small amounts of effort, we were able to multiply the amount of training data by 18x. Even though the additional training data was noisy, the BiLSTM model was able to learn effectively and beat the baseline model by 0.6%.
Representation learning or pre-training leads to transfer learning and fine-tuning models performing well on their downstream tasks. However, in many domains like medicine, the amount of labeled data may be small or quite expensive to acquire. Using the techniques learned in this chapter, the amount of training data can be expanded rapidly with little effort. Building a state-of...