Summary
In this chapter, we learned how to train advanced models and saw that their training is not much more difficult than training classical ML models, which were described in Chapter 10. Even though the models that we trained are much more complex than the models in Chapter 10, we can use the same principles and expand this kind of activity to train even more complex models.
We focused on GenAI in the form of BERT models (fundamental GPT models) and AEs. Training these models is not very difficult, and we do not need huge computing power to train them. Our wolfBERTa
model has ca. 80 million parameters, which seems like a lot, but the really good models, such as GPT-3, have billions of parameters – GPT-3 has 175 billion parameters, NVIDIA Turing has over 350 billion parameters, and GPT-4 is 1,000 times larger than GPT-3. The training process is the same, but we need a supercomputing architecture in order to train these models.
We have also learned that these models...