Data augmentation using GPT-3
Generative models are becoming more and more powerful, especially in NLP. Using these to generate new, synthetic data sometimes allows us to significantly improve the results we get and regularize models. We’ll learn how to do this in this recipe.
Getting ready
While models such as BERT are effective at tasks such as text classification, they usually do not perform very well when it comes to text generation.
Other types of models, such as generative pre-trained transformer (GPT) models, can be quite impressive at generating new data. In this recipe, we will use the OpenAI API and GPT-3.5 to generate synthetic yet realistic data. Having more data is key to having more regularization in our models, and data generation is one way to collect more data.
For this recipe, you will need to install the OpenAI library with pip
install openai
.
Also, since the API we will use is not free, it is necessary to create an OpenAI account with a generated...