Hands-on diffusion models in practice
Let’s study a practical example that demonstrates the usability of synthetic data in the computer vision field. For that aim, we will generate and prepare our dataset, build our ML model from scratch, train it, and evaluate its performance. The dataset is available at Kaggle (https://www.kaggle.com/datasets/abdulrahmankerim/crash-car-image-hybrid-dataset-ccih). The full code, the trained model, and the results are available on GitHub under the corresponding chapter folder in the book’s repository.
Context
We want to build an ML model that can classify car images into two distinct categories – images depicting car accidents and those that do not. As you can imagine, curating such a real dataset is time-consuming and error-prone. It could be easy to collect car images without accidents. However, collecting images of cars with accidents, collisions, fires, and other dangerous scenarios is extremely hard. To solve this problem...