The goal of this chapter is to track multiple visually salient objects in a video sequence at once. Instead of labeling the objects of interest in the video ourselves, we will let the algorithm decide which regions of a video frame are worth tracking.
We have previously learned how to detect simple objects of interest (such as a human hand) in tightly controlled scenarios and how to infer geometrical features of a visual scene from camera motion. In this chapter, we ask what we can learn about a visual scene by looking at the image statistics of a large number of frames.
In this chapter, we will cover the following topics:
- Planning the app
- Setting up the app
- Mapping visual saliency
- Understanding mean-shift tracking
- Learning about the OpenCV Tracking API
- Putting it all together
By analyzing the Fourier spectrum of natural images, we will build...