Using synthetic data to solve time and efficiency issues
Automatic data generation of synthetic data removes many unnecessary elements in the real data curation and annotation pipeline. Collecting real data often requires special equipment, such as high-resolution cameras, microphones, or LiDAR. At the same time, you need engineers and technicians who are trained to use such equipment. You lose time and money training engineers and buying or renting this equipment. Often, data curators need to travel and visit various locations to collect suitable data, meaning that you would have to pay for transportation, accommodation, insurance, and more.
Synthetic data is an effective solution for these issues (see Figure 5.4). In addition to the preceding issues, it is easy to conclude that synthetic data has a lower carbon footprint than real data. Thus, it is even better for the environment!
Data annotation is one of the main issues that makes real datasets cumbersome. Annotating large...