This chapter looked at a robust feature tracking method that is fast enough to run in real time when applied to the live stream of a webcam.
First, the algorithm shows you how to extract and detect important features in an image, which was independent of perspective and size, be it in a template of our object of interest (train image) or a more complex scene in which we expect the object of interest to be embedded (query image).
A match between feature points in the two images is then found by clustering the keypoints using a fast version of the nearest-neighbor algorithm. From there on, it is possible to calculate a perspective transformation that maps one set of feature points to the other. With this information, we can outline the train image as found in the query image and warp the query image so that the object of interest appears upright in the center of the screen...