Synthetic data generation
The data we have access to for training and evaluating our machine learning models may be limited. For example, in the case of classification models, we might have classes with a limited number of data points, resulting in lower performance of our models for unseen data points of the same classes. We will go through a few methods here to help you improve the performance of your models in these situations.
Oversampling for imbalanced data
Imbalanced data classification is challenging due to the dominating effect of majority classes during training as well as in model performance reporting. For model performance reporting, we discussed different performance metrics in the previous chapter and how you can select a reliable metric even in the case of imbalanced data classification. Here, we want to talk about the concept of oversampling to help you improve the performance of your models by synthetically improving your training data. The concept of oversampling...