Summary
The goal of this chapter was to give you a better understanding of fine-tuning and evaluating ML models overall, comparing them with open source options, and ultimately keeping humans in the loop.
We started with a recap of fine-tuning for language, text, and everything in between, discussing the benefits of both general and specialized knowledge. We learned about fine-tuning a language-only model, and how generally this is possible with even a small amount of data. We also talked about fine-tuning vision-only models, and how generally it is much more likely to overfit, making it a challenging proposition. We looked at fine-tuning jointly trained vision-language models, including Stable Diffusion and an interesting open source project called Riffusion. We talked about comparing performance with off-the-shelf public models. We learned about model evaluation metrics for vision specifically, along with language, and the emerging joint vision-language space. We also looked at...