Finally, we can reconstruct the 3D scene by making use of a process called triangulation. We are able to infer the 3D coordinates of a point because of the way epipolar geometry works. By calculating the essential matrix, we get to know more about the geometry of the visual scene than we might think. Because the two cameras depict the same real-world scene, we know that most of the 3D real-world points will be found in both images.
Moreover, we know that the mapping from the 2D image points to the corresponding 3D real-world points will follow the rules of geometry. If we study a sufficiently large number of image points, we can construct, and solve, a (large) system of linear equations to get the ground truth of the real-world coordinates.
Let's return to the Swiss fountain dataset. If we ask two photographers to take a picture of the fountain from...