Summary
In this chapter, we provided a primer on synthetic data and its common uses. Synthetic data is a key part of the data-centric toolkit because it gives us yet another avenue to much better input data, especially when collecting new data is not feasible.
By now, you should have a clear understanding of the fundamentals of synthetic data and its potential applications. Synthetic data is often used for computer vision, natural language processing, and privacy protection applications. However, the potential of synthetic data goes well beyond these three realms.
Whole books have been dedicated to the topic of synthetic data and we recommend that you dive deeper into the subject if you want to become a true expert in synthetic data generation.
In the next chapter, we’ll explore another powerful technique for improving your data without the need for collecting new data: programmatic labeling.