Understanding camera models
In this section, we will learn about camera models. In 3D deep learning, usually we need to use 2D images for 3D detection. Either 3D information is detected solely from 2D images, or 2D images are fused with depth for high accuracy. Nevertheless, camera models are essential to build correspondence between the 2D space and the 3D world.
In PyTorch3D, there are two major camera models, the orthographic camera defined by the OrthographicCameras
class and the perspective camera model defined by the PerspectiveCameras
class. The following figure shows the differences between the two camera models.
Figure 1.5 – Two major camera models implemented in PyTorch3D, perspective and orthographic
The orthographic cameras use orthographic projections to map objects in the 3D world to 2D images, while the perspective cameras use perspective projections to map objects in the 3D world to 2D images. The orthographic projections map objects to 2D images, disregarding the object depth. For example, just as shown in the figure, two objects with the same geometric size at different depths would be mapped to 2D images of the same size. On the other hand, in perspective projections, if an object moved far away from the camera, it would be mapped to a smaller size on the 2D images.
Now that we have learned about the basic concept of camera models, let us look at some coding examples to see how we can create and use these camera models.