This chapter showed a relatively simple—and yet surprisingly robust—way of recognizing a variety of hand gestures by counting the number of extended fingers.
The algorithm first shows how a task-relevant region of the image can be segmented using depth information acquired from a Microsoft Kinect 3D sensor, and how morphological operations can be used to clean up the segmentation result. By analyzing the shape of the segmented hand region, the algorithm comes up with a way to classify hand gestures based on the types of convexity effects found in the image.
Once again, mastering our use of OpenCV to perform the desired task did not require us to produce a large amount of code. Instead, we were challenged to gain an important insight that made us use the built-in functionality of OpenCV in an effective way.
Gesture recognition is a popular but challenging field in computer science, with applications in a large number of areas, such as Human-Computer Interaction (HCI), video surveillance, and even the video game industry. You can now use your advanced understanding of segmentation and structure analysis to build your own state-of-the-art gesture recognition system. Another approach you might want to use for hand gesture recognition is to train a deep image classification network on hand gestures. We will discuss deep networks for image classifications in Chapter 9, Learning to Classify and Localize Objects.
In the next chapter, we will continue to focus on detecting objects of interest in visual scenes, but we will assume a much more complicated case: viewing the object from an arbitrary perspective and distance. To do this, we will combine perspective transformations with scale-invariant feature descriptors to develop a robust feature-matching algorithm.