Preparing the data
We’ve already seen why customizing the model is important to improve its accuracy and performance. We’ve also seen that continued pre-training is an unsupervised learning approach that needs unlabeled data, whereas fine-tuning is a supervised learning approach that needs labeled data.
The type of data we provide to the model can change the way the model responds. If the data is biased or has highly correlated features, you might not get the right responses from the trained custom model. This is true for any ML models you are training, so it is essential to provide high-quality data. While I won’t cover data processing and feature engineering concepts in this book, I wanted to highlight their importance. If you wish to learn more about these concepts, you can go through any ML courses and books, such as Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron, and Feature Engineering for Machine Learning...