Though rendering synthetic images has enabled a variety of computer vision applications, it is, however, not the perfect remedy for data scarcity (or at least not yet). While computer graphics frameworks can nowadays render hyper-realistic images, they need detailed 3D models for that (with precise surfaces and high-quality texture information). Gathering the data to build such models is as expensive as—if not more than—directly building a dataset of real images for the target objects.
Because 3D models sometimes have simplified geometries or lack texture-related information, realistic synthetic datasets are not that common. This realism gap between the rendered training data and the real target images harms the performance of the models. The visual cues they have learned to rely on while training on synthetic data may not appear in real images (which may have differently saturated colors, more complex textures or surfaces, and so on).