In the previous chapter, you learned how to detect and track a simple object (the silhouette of a hand) in a very controlled environment. To be more specific, we instructed the user of our app to place the hand in the central region of the screen and then made assumptions about the size and shape of the object (the hand). In this chapter, we want to detect and track objects of arbitrary sizes, possibly viewed from several different angles or under partial occlusion.
For this, we will make use of feature descriptors, which are a way of capturing the important properties of our object of interest. We do this so that the object can be located even when it is embedded in a busy visual scene. We will apply our algorithm to the live stream of a webcam and do our best to keep the algorithm robust yet simple enough to run...