What is synthetic data?
Synthetic data is artificially generated data: the data is not captured, measured, or recorded from the real world. Instead, algorithms or software were used to create or generate this data. Synthetic data can be generated by simulating natural phenomena using mathematical models or by applying some approximations of real-world processes. There are many approaches to generating synthetic data, such as leveraging game engines, such as Unreal and Unity, or utilizing statistical models, such as GANs and diffusion models. As we know, ML models require large-scale training datasets for training and evaluation. Collecting and annotating these datasets is extremely time-consuming, error-prone, and subject to privacy issues. Please refer to Chapters 2 and 3. Synthetic data is a powerful solution to address these previous limitations.
Synthetic data is useful for scenarios where collecting and annotating data is expensive, but its applications go beyond this particular...