SynSin network architecture
The idea of SynSin is to solve the view synthesis problem with an end-to-end model using only one image at test time. This is a model that doesn’t need 3D data annotations and acheives very good accuracy compared to its baseline:
Figure 9.2: The structure of the end-to-end SynSin method
The model is trained end-to-end, and it consists of three different modules:
- Spatial feature and depth networks
- Neural point cloud renderer
- Refinement module and discriminator
Let’s dive deeper into each one to better understand the architecture.
Spatial feature and depth networks
If we zoom into the first part of Figure 9.2, we can see two different networks that are fed by the same image. These are the spatial feature network (f) and the depth network (d) (Figure 9.3):
Figure 9.3: Input and outputs of the spatial feature and depth networks
Given a reference image and the desired...