Weakly Supervised Learning for Classification with Snorkel
Models such as BERT and GPT use massive amounts of unlabeled data along with an unsupervised training objective, such as a masked language model (MLM) for BERT or a next word prediction model for GPT, to learn the underlying structure of text. A small amount of task-specific data is used for fine-tuning the pre-trained model using transfer learning. Such models are quite large, with hundreds of millions of parameters, and require massive datasets for pre-training and lots of computation capacity for training and pre-training. Note that the critical problem being solved is the lack of adequate training data. If there were enough domain-specific training data, the gains from BERT-like pre-trained models would not be that big. In certain domains such as medicine, the vocabulary used in task-specific data is typical for the domain. Modest increases in training data can improve the quality of the model to a large extent. However...