Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
OpenCV 4 with Python Blueprints
OpenCV 4 with Python Blueprints

OpenCV 4 with Python Blueprints: Build creative computer vision projects with the latest version of OpenCV 4 and Python 3 , Second Edition

Arrow left icon
Profile Icon Michael Beyeler (USD) Profile Icon Dr. Menua Gevorgyan Profile Icon Mamikonyan Profile Icon Michael Beyeler
Arrow right icon
€8.99 €29.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (4 Ratings)
eBook Mar 2020 366 pages 2nd Edition
eBook
€8.99 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Michael Beyeler (USD) Profile Icon Dr. Menua Gevorgyan Profile Icon Mamikonyan Profile Icon Michael Beyeler
Arrow right icon
€8.99 €29.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (4 Ratings)
eBook Mar 2020 366 pages 2nd Edition
eBook
€8.99 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€8.99 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

OpenCV 4 with Python Blueprints

Hand Gesture Recognition Using a Kinect Depth Sensor

The goal of this chapter is to develop an app that detects and tracks simple hand gestures in real time, using the output of a depth sensor, such as that of a Microsoft Kinect 3D sensor or an ASUS Xtion sensor. The app will analyze each captured frame to perform the following tasks:

  • Hand region segmentation: The user's hand region will be extracted in each frame by analyzing the depth map output of the Kinect sensor, which is done by thresholding, applying some morphological operations, and finding connected components.
  • Hand shape analysis: The shape of the segmented hand region will be analyzed by determining contours, convex hull, and convexity defects.
  • Hand gesture recognition: The number of extended fingers will be determined based on the hand contour's convexity defects, and the gesture will be classified accordingly (with no extended fingers corresponding to a fist, and five extended fingers corresponding to an open hand).

Gesture recognition is an ever-popular topic in computer science. This is because it not only enables humans to communicate with machines (Human-Machine Interaction (HMI)) but also constitutes the first step for machines to begin understanding human body language. With affordable sensors such as Microsoft Kinect or Asus Xtion and open source software such as OpenKinect and OpenNI, it has never been easier to get started in the field yourself. So, what shall we do with all this technology?

In this chapter, we will cover the following topics:

  • Planning the app
  • Setting up the app
  • Tracking hand gestures in real time
  • Understanding hand region segmentation
  • Performing hand shape analysis
  • Performing hand gesture recognition

The beauty of the algorithm that we are going to implement in this chapter is that it works well for many hand gestures, yet it is simple enough to run in real time on a generic laptop. Also, if we want, we can easily extend it to incorporate more complicated hand-pose estimations.

Once you complete the app, you will understand how to use depth sensors in your own apps. You will learn how to compose shapes of interest with OpenCV from the depth information, as well as understanding how to analyze shapes with OpenCV, using their geometric properties.

Getting started

This chapter requires you to have a Microsoft Kinect 3D sensor installed. Alternatively, you may install an Asus Xtion sensor or any other depth sensor for which OpenCV has built-in support.

First, install OpenKinect and libfreenect from http://www.openkinect.org/wiki/Getting_Started. You can find the code that we present in this chapter at our GitHub repository: https://github.com/PacktPublishing/OpenCV-4-with-Python-Blueprints-Second-Edition/tree/master/chapter2.

Let's first plan the application we are going to create in this chapter.

Planning the app

The final app will consist of the following modules and scripts:

  • gestures: This is a module that consists of an algorithm for recognizing hand gestures.
  • gestures.process: This is a function that implements the entire process flow of hand gesture recognition. It accepts a single-channel depth image (acquired from the Kinect depth sensor) and returns an annotated Blue, Green, Red (BGR) color image with an estimated number of extended fingers.
  • chapter2: This is the main script for the chapter.
  • chapter2.main: This is the main function routine that iterates over frames acquired from a depth sensor that uses .process gestures to process frames, and then illustrates results.

The end product looks like this:

No matter how many fingers of a hand are extended, the algorithm correctly segments the hand region (white), draws the corresponding convex hull (the green line surrounding the hand), finds all convexity defects that belong to the spaces between fingers (large green points) while ignoring others (small red points), and infers the correct number of extended fingers (the number in the bottom-right corner), even for a fist.

Now, let's set up the application in the next section.

Setting up the app

Before we can get down to the nitty-gritty of our gesture recognition algorithm, we need to make sure that we can access the depth sensor and display a stream of depth frames. In this section, we will cover the following things that will help us set up the app:

  • Accessing the Kinect 3D sensor
  • Utilizing OpenNI-compatible sensors
  • Running the app and main function routine

First, we will look at how to use the Kinect 3D sensor.

Accessing the Kinect 3D sensor

The easiest way to access a Kinect sensor is by using an OpenKinect module called freenect. For installation instructions, take a look at the preceding section.

The freenect module has functions such as sync_get_depth() and sync_get_video(), used to obtain images synchronously from the depth sensor and camera sensor respectively. For this chapter, we will need only the Kinect depth map, which is a single-channel (grayscale) image in which each pixel value is the estimated distance from the camera to a particular surface in the visual scene.

Here, we will design a function that will read a frame from the sensor and convert it to the desired format, and return the frame together with a success status, as follows:

def read_frame(): -> Tuple[bool,np.ndarray]:

The function consists of the following steps:

  1. Grab a frame; terminate the function if a frame was not acquired, like this:
    frame, timestamp = freenect.sync_get_depth() 
if frame is None:
return False, None

The sync_get_depth method returns both the depth map and a timestamp. By default, the map is in an 11-bit format. The last 10 bits of the sensor describes the depth, while the first bit states that the distance estimation was not successful when it's equal to 1.

  1. It is a good idea to standardize the data into an 8-bit precision format, as an 11-bit format is inappropriate to be visualized with cv2.imshow right away, as well as in the future. We might want to use some different sensor that returns in a different format, as follows:
np.clip(depth, 0, 2**10-1, depth) 
depth >>= 2 

In the previous code, we have first clipped the values to 1,023 (or 2**10-1) to fit in 10 bits. Such clipping results in the assignment of the undetected distance to the farthest possible point. Next, we shift 2 bits to the right to fit the distance in 8 bits.

  1. Finally, we convert the image into an 8-bit unsigned integer array and return the result, as follows:
return True, depth.astype(np.uint8) 

Now, the depth image can be visualized as follows:

cv2.imshow("depth", read_frame()[1]) 

Let's see how to use OpenNI-compatible sensors in the next section.

Utilizing OpenNI-compatible sensors

To use an OpenNI-compatible sensor, you must first make sure that OpenNI2 is installed and that your version of OpenCV was built with the support of OpenNI. The build information can be printed as follows:

import cv2
print(cv2.getBuildInformation())

If your version was built with OpenNI support, you will find it under the Video I/O section. Otherwise, you will have to rebuild OpenCV with OpenNI support, which is done by passing the -D WITH_OPENNI2=ON flag to cmake.

After the installation process is complete, you can access the sensor similarly to other video input devices, using cv2.VideoCapture. In this app, in order to use an OpenNI-compatible sensor instead of a Kinect 3D sensor, you have to cover the following steps:

  1. Create a video capture that connects to your OpenNI-compatible sensor, like this:
device = cv2.cv.CV_CAP_OPENNI 
capture = cv2.VideoCapture(device) 

If you want to connect to Asus Xtion, the device variable should be assigned to the cv2.CV_CAP_OPENNI_ASUS value instead.

  1. Change the input frame size to the standard Video Graphics Array (VGA) resolution, as follows:
capture.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, 640) 
capture.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, 480) 
  1. In the previous section, we designed the read_frame function, which accesses the Kinect sensor using freenect. In order to read depth images from the video capture, you have to change that function to the following one:
def read_frame():
if not capture.grab():
return False,None
return capture.retrieve(cv2.CAP_OPENNI_DEPTH_MAP)

You will note that we have used the grab and retrieve methods instead of the read method. The reason is that the read method of cv2.VideoCapture is inappropriate when we need to synchronize a set of cameras or a multi-head camera, such as a Kinect.

For such cases, you grab frames from multiple sensors at a certain moment in time with the grab method and then retrieve the data of the sensors of interest with the retrieve method. For example, in your own apps, you might also need to retrieve a BGR frame (standard camera frame), which can be done by passing cv2.CAP_OPENNI_BGR_IMAGE to the retrieve method.

So, now that you can read data from your sensor, let's see how to run the application in the next section.

Running the app and main function routine

The chapter2.py script is responsible for running the app, and it first imports the following modules:

import cv2
import numpy as np
from gestures import recognize
from frame_reader import read_frame

The recognize function is responsible for recognizing a hand gesture, and we will compose it later in this chapter. We have also placed the read_frame method that we composed in the previous section in a separate script, for convenience.

In order to simplify the segmentation task, we will instruct the user to place their hand in the center of the screen. To provide a visual aid for this, we create the following function:

def draw_helpers(img_draw: np.ndarray) -> None:
# draw some helpers for correctly placing hand
height, width = img_draw.shape[:2]
color = (0,102,255)
cv2.circle(img_draw, (width // 2, height // 2), 3, color, 2)
cv2.rectangle(img_draw, (width // 3, height // 3),
(width * 2 // 3, height * 2 // 3), color, 2)

The function draws a rectangle around the image center and highlights the center pixel of the image in orange.

All the heavy lifting is done by the main function, shown in the following code block:

def main():
for _, frame in iter(read_frame, (False, None)):

The function iterates over grayscale frames from Kinect, and, in each iteration, it covers the following steps:

  1. Recognize hand gestures using the recognize function, which returns the estimated number of extended fingers (num_fingers) and an annotated BGR color image, as follows:
num_fingers, img_draw = recognize(frame)
  1. Call the draw_helpers function on the annotated BGR image in order to provide a visual aid for hand placement, as follows:
 draw_helpers(img_draw)
  1. Finally, the main function draws the number of fingers on the annotated frame, displays results with cv2.imshow, and sets termination criteria, as follows:
        # print number of fingers on image
cv2.putText(img_draw, str(num_fingers), (30, 30),
cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255))
cv2.imshow("frame", img_draw)
# Exit on escape
if cv2.waitKey(10) == 27:
break

So, now that we have the main script, you will note that the only function that we are missing is the recognize function. In order to track hand gestures, we need to compose this function, which we will do in the next section.

Tracking hand gestures in real time

Hand gestures are analyzed by the recognize function; this is where the real magic takes place. This function handles the entire process flow, from the raw grayscale image to a recognized hand gesture. It returns the number of fingers and the illustration frame. It implements the following procedure:

  1. It extracts the user's hand region by analyzing the depth map (img_gray), and returns a hand region mask (segment), like this:
def recognize(img_gray: np.ndarray) -> Tuple[int,np.ndarray]:
# segment arm region
segment = segment_arm(img_gray)
  1. It performs contour analysis on the hand region mask (segment). Then, it returns the largest contour found in the image (contour) and any convexity defects (defects), as follows:
# find the hull of the segmented area, and based on that find the
# convexity defects
contour, defects = find_hull_defects(segment)
  1. Based on the contour found and the convexity defects, it detects the number of extended fingers (num_fingers) in the image. Then, it creates an illustration image (img_draw) using (segment) image as a template, and annotates it with contour and defect points, like this:
img_draw = cv2.cvtColor(segment, cv2.COLOR_GRAY2RGB)
num_fingers, img_draw = detect_num_fingers(contour,
defects, img_draw)
  1. Finally, the estimated number of extended fingers (num_fingers), as well as the annotated output image (img_draw), are returned, as follows:
return num_fingers, img_draw

In the next section, let's learn how to accomplish hand region segmentation, which we used at the beginning of the procedure.

Understanding hand region segmentation

The automatic detection of an arm—and later, the hand region—could be designed to be arbitrarily complicated, maybe by combining information about the shape and color of an arm or hand. However, using skin color as a determining feature to find hands in visual scenes might fail terribly in poor lighting conditions or when the user is wearing gloves. Instead, we choose to recognize the user's hand by its shape in the depth map.

Allowing hands of all sorts to be present in any region of the image unnecessarily complicates the mission of the present chapter, so we make two simplifying assumptions:

  • We will instruct the user of our app to place their hand in front of the center of the screen, orienting their palm roughly parallel to the orientation of the Kinect sensor so that it is easier to identify the corresponding depth layer of the hand.
  • We will also instruct the user to sit roughly 1 to 2 meters away from the Kinect and to slightly extend their arm in front of their body so that the hand will end up in a slightly different depth layer than the arm. However, the algorithm will still work even if the full arm is visible.

In this way, it will be relatively straightforward to segment the image based on the depth layer alone. Otherwise, we would have to come up with a hand detection algorithm first, which would unnecessarily complicate our mission. If you feel adventurous, feel free to do this on your own.

Let's see how to find the most prominent depth of the image center region in the next section.

Finding the most prominent depth of the image center region

Once the hand is placed roughly in the center of the screen, we can start finding all image pixels that lie on the same depth plane as the hand. This is done by following these steps:

  1. First, we simply need to determine the most prominent depth value of the center region of the image. The simplest approach would be to look only at the depth value of the center pixel, like this:
width, height = depth.shape 
center_pixel_depth = depth[width/2, height/2] 
  1. Then, create a mask in which all pixels at a depth of center_pixel_depth are white and all others are black, as follows:
import numpy as np 
 
depth_mask = np.where(depth == center_pixel_depth, 255, 
0).astype(np.uint8)

However, this approach will not be very robust, because there is the chance that it will be compromised by the following:

  • Your hand will not be placed perfectly parallel to the Kinect sensor.
  • Your hand will not be perfectly flat.
  • The Kinect sensor values will be noisy.

Therefore, different regions of your hand will have slightly different depth values.

The segment_arm method takes a slightly better approach—it looks at a small neighborhood in the center of the image and determines the median depth value. This is done by following these steps:

  1. First, we find the center region (for example, 21 x 21 pixels) of the image frame, like this:
def segment_arm(frame: np.ndarray, abs_depth_dev: int = 14) -> np.ndarray:
height, width = frame.shape
# find center (21x21 pixels) region of imageheight frame
center_half = 10 # half-width of 21 is 21/2-1
center = frame[height // 2 - center_half:height // 2 + center_half,
width // 2 - center_half:width // 2 + center_half]
  1. Then, we determine the median depth value, med_val, as follows:
med_val = np.median(center) 

We can now compare med_val with the depth value of all pixels in the image and create a mask in which all pixels whose depth values are within a particular range [med_val-abs_depth_dev, med_val+abs_depth_dev] are white, and all other pixels are black.

However, for reasons that will become clear in a moment, let's paint the pixels gray instead of white, like this:

frame = np.where(abs(frame - med_val) <= abs_depth_dev,
128, 0).astype(np.uint8)
  1. The result will look like this:

You will note that the segmentation mask is not smooth. In particular, it contains holes at points where the depth sensor failed to make a prediction. Let's learn how to apply morphological closing to smoothen the segmentation mask, in the next section.

Applying morphological closing for smoothening

A common problem with segmentation is that a hard threshold typically results in small imperfections (that is, holes, as in the preceding image) in the segmented region. These holes can be alleviated by using morphological opening and closing. When it is opened, it removes small objects from the foreground (assuming that the objects are bright on a dark foreground), whereas closing removes small holes (dark regions).

This means that we can get rid of the small black regions in our mask by applying morphological closing (dilation followed by erosion) with a small 3 x 3-pixel kernel, as follows:

kernel = np.ones((3, 3), np.uint8)
frame = cv2.morphologyEx(frame, cv2.MORPH_CLOSE, kernel)

The result looks a lot smoother, as follows:

Notice, however, that the mask still contains regions that do not belong to the hand or arm, such as what appears to be one of the knees on the left and some furniture on the right. These objects just happen to be on the same depth layer of my arm and hand. If possible, we could now combine the depth information with another descriptor, maybe a texture- or skeleton-based hand classifier that would weed out all non-skin regions.

An easier approach is to realize that most of the time, hands are not connected to knees or furniture. Let's learn how to find connected components in a segmentation mask.

Finding connected components in a segmentation mask

We already know that the center region belongs to the hand. For such a scenario, we can simply apply cv2.floodfill to find all the connected image regions.

Before we do this, we want to be absolutely certain that the seed point for the flood fill belongs to the right mask region. This can be achieved by assigning a grayscale value of 128 to the seed point. However, we also want to make sure that the center pixel does not, by any coincidence, lie within a cavity that the morphological operation failed to close.

So, let's set a small 7 x 7-pixel region with a grayscale value of 128 instead, like this:

small_kernel = 3
frame[height // 2 - small_kernel:height // 2 + small_kernel,
width // 2 - small_kernel:width // 2 + small_kernel] = 128

As flood filling (as well as morphological operations) is potentially dangerous, OpenCV requires the specification of a mask that avoids flooding the entire image. This mask has to be 2 pixels wider and taller than the original image and has to be used in combination with the cv2.FLOODFILL_MASK_ONLY flag.

It can be very helpful to constrain the flood filling to a small region of the image or a specific contour so that we need not connect two neighboring regions that should never have been connected in the first place. It's better to be safe than sorry, right?

Nevertheless, today, we feel courageous! Let's make the mask entirely black, like this:

mask = np.zeros((height + 2, width + 2), np.uint8)

Then, we can apply the flood fill to the center pixel (the seed point), and paint all the connected regions white, as follows:

flood = frame.copy()
cv2.floodFill(flood, mask, (width // 2, height // 2), 255,
flags=4 | (255 << 8))

At this point, it should be clear why we decided to start with a gray mask earlier. We now have a mask that contains white regions (arm and hand), gray regions (neither arm nor hand, but other things in the same depth plane), and black regions (all others). With this setup, it is easy to apply a simple binary threshold to highlight only the relevant regions of the pre-segmented depth plane, as follows:

ret, flooded = cv2.threshold(flood, 129, 255, cv2.THRESH_BINARY) 

This is what the resulting mask looks like:

The resulting segmentation mask can now be returned to the recognize function, where it will be used as an input to the find_hull_defects function, as well as a canvas for drawing the final output image (img_draw). The function analyzes the shape of a hand in order to detect the defects of a hull that corresponds to the hand. Let's learn how to perform hand shape analysis in the next section.

Performing hand shape analysis

Now that we know (roughly) where the hand is located, we aim to learn something about its shape. In this app, we will make a decision on which exact gesture is shown, based on convexity defects of a contour corresponding to the hand.

Let's move on and learn how to determine the contour of the segmented hand region in the next section, which will be the first step in our hand shape analysis.

Determining the contour of the segmented hand region

The first step involves determining the contour of the segmented hand region. Luckily, OpenCV comes with a pre-canned version of such an algorithm—cv2.findContours. This function acts on a binary image and returns a set of points that are believed to be part of the contour. As there might be multiple contours present in the image, it is possible to retrieve an entire hierarchy of contours, as follows:

def find_hull_defects(segment: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
contours, hierarchy = cv2.findContours(segment, cv2.RETR_TREE,
cv2.CHAIN_APPROX_SIMPLE)

Furthermore, because we do not know which contour we are looking for, we have to make an assumption to clean up the contour result, since it is possible that some small cavities are left over even after the morphological closing. However, we are fairly certain that our mask contains only the segmented area of interest. We will assume that the largest contour found is the one that we are looking for.

Thus, we simply traverse the list of contours, calculate the contour area (cv2.contourArea), and store only the largest one (max_contour), like this:

max_contour = max(contours, key=cv2.contourArea) 

The contour that we found might still have too many corners. We approximate the contour with a similar contour that does not have sides that are less than 1 percent of the perimeter of the contour, like this:

epsilon = 0.01 * cv2.arcLength(max_contour, True)
max_contour = cv2.approxPolyDP(max_contour, epsilon, True)

Let's learn how to find the convex hull of a contour area, in the next section.

Finding the convex hull of a contour area

Once we have identified the largest contour in our mask, it is straightforward to compute the convex hull of the contour area. The convex hull is basically the envelope of the contour area. If you think of all the pixels that belong to the contour area as a set of nails poking out of a board, then a tight rubber band encircles all the nails forming the convex hull shape. We can get the convex hull directly from our largest contour (max_contour), like this:

hull = cv2.convexHull(max_contour, returnPoints=False) 

As we now want to look at convexity deficits in this hull, we are instructed by the OpenCV documentation to set the returnPoints optional flag to False.

The convex hull drawn in yellow around a segmented hand region looks like this:

As mentioned previously, we will determine a hand gesture based on convexity defects. Let's move on and learn how to find the convexity defects of a convex hull in the next section, which will bring us one step closer to recognizing hand gestures.

Finding the convexity defects of a convex hull

As is evident from the preceding screenshot, not all points on the convex hull belong to the segmented hand region. In fact, all the fingers and the wrist cause severe convexity defects—that is, points of the contour that are far away from the hull.

We can find these defects by looking at both the largest contour (max_contour) and the corresponding convex hull (hull), as follows:

defects = cv2.convexityDefects(max_contour, hull) 

The output of this function (defects) is a NumPy array containing all defects. Each defect is an array of four integers that are start_index (index of the point in the contour where the defect begins), end_index (index of the point in the contour where the defect ends), farthest_pt_index (the index of the farthest point from the convex hull within the defect), and fixpt_depth (the distance between the farthest point and the convex hull).

We will make use of this information in just a moment when we try to estimate the number of extended fingers.

For now, though, our job is done. The extracted contour (max_contour) and convexity defects (defects) can be returned to recognize, where they will be used as inputs to detect_num_fingers, as follows:

return max_contour, defects 

So, now that we have found the defects, let's move on and learn how to perform hand gesture recognition using the convexity defects, which will bring us toward the completion of the app.

Performing hand gesture recognition

What remains to be done is to classify the hand gesture based on the number of extended fingers. For example, if we find five extended fingers, we assume the hand to be open, whereas no extended fingers implies a fist. All that we are trying to do is count from zero to five, and make the app recognize the corresponding number of fingers.

This is actually trickier than it might seem at first. For example, people in Europe might count to three by extending their thumb, index finger, and middle finger. If you do that in the US, people there might get horrendously confused, because they do not tend to use their thumbs when signaling the number two.

This might lead to frustration, especially in restaurants (trust me). If we could find a way to generalize these two scenarios—maybe by appropriately counting the number of extended fingers, we would have an algorithm that could teach simple hand gesture recognition to not only a machine but also (maybe) to a person of average intellect.

As you might have guessed, the answer is related to convexity defects. As mentioned earlier, extended fingers cause defects in the convex hull. However, the inverse is not true; that is, not all convexity defects are caused by fingers! There might be additional defects caused by the wrist, as well as the overall orientation of the hand or the arm. How can we distinguish between these different causes of defects?

Let's distinguish between different cases of convexity defects, in the next section.

Distinguishing between different causes of convexity defects

The trick is to look at the angle between the farthest point from the convex hull point within the defect (farthest_pt_index) and the start and endpoints of the defect (start_index and end_index, respectively), as illustrated in the following screenshot:

In the previous screenshot, the orange markers serve as a visual aid to center the hand in the middle of the screen, and the convex hull is outlined in green. Each red dot corresponds to the point farthest from the convex hull (farthest_pt_index) for every convexity defect detected. If we compare a typical angle that belongs to two extended fingers (such as θj) to an angle that is caused by general hand geometry (such as θi), we notice that the former is much smaller than the latter.

This is obviously because humans can spread their fingers only a little, thus creating a narrow angle made by the farthest defect point and the neighboring fingertips. Therefore, we can iterate over all convexity defects and compute the angle between the said points. For this, we will need a utility function that calculates the angle (in radians) between two arbitrary—a list like vectors, v1 and v2, as follows:

def angle_rad(v1, v2): 
    return np.arctan2(np.linalg.norm(np.cross(v1, v2)), 
np.dot(v1, v2))

This method uses the cross product to compute the angle, rather than doing it in the standard way. The standard way of calculating the angle between two vectors v1 and v2 is by calculating their dot product and dividing it by the norm of v1 and the norm of v2. However, this method has two imperfections:

  • You have to manually avoid division by zero if either the norm of v1 or the norm of v2 is zero.
  • The method returns relatively inaccurate results for small angles.

Similarly, we provide a simple function to convert an angle from degrees to radians, illustrated here:

def deg2rad(angle_deg): 
    return angle_deg/180.0*np.pi 

In the next section, we'll see how to classify hand gestures based on the number of extended fingers.

Classifying hand gestures based on the number of extended fingers

What remains to be done is to actually classify the hand gesture based on the number of instances of extended fingers. The classification is done using the following function:

def detect_num_fingers(contour: np.ndarray, defects: np.ndarray,
img_draw: np.ndarray, thresh_deg: float = 80.0) -> Tuple[int, np.ndarray]:

The function accepts the detected contour (contour), the convexity defects (defects), a canvas to draw on (img_draw), and a cutoff angle that can be used as a threshold to classify whether convexity defects are caused by extended fingers or not (thresh_deg).

Except for the angle between the thumb and the index finger, it is rather hard to get anything close to 90 degrees, so anything close to that number should work. We do not want the cutoff angle to be too high, because that might lead to errors in classifications. The complete function will return the number of fingers and the illustration frame, and consists of the following steps:

  1. First, let's focus on special cases. If we do not find any convexity defects, it means that we possibly made a mistake during the convex hull calculation, or there are simply no extended fingers in the frame, so we return 0 as the number of detected fingers, as follows:
if defects is None: 
    return [0, img_draw] 
  1. However, we can take this idea even further. Due to the fact that arms are usually slimmer than hands or fists, we can assume that the hand geometry will always generate at least two convexity defects (which usually belong to the wrists). So, if there are no additional defects, it implies that there are no extended fingers:
if len(defects) <= 2: 
    return [0, img_draw] 
  1. Now that we have ruled out all special cases, we can begin counting real fingers. If there is a sufficient number of defects, we will find a defect between every pair of fingers. Thus, in order to get the number right (num_fingers), we should start counting at 1, like this:
num_fingers = 1 
  1. Then, we start iterating over all convexity defects. For each defect, we extract the three points and draw its hull for visualization purposes, as follows:
# Defects are of shape (num_defects,1,4)
for defect in defects[:, 0, :]:
# Each defect is an array of four integers.
# First three indexes of start, end and the furthest
# points respectively
start, end, far = [contour[i][0] for i in defect[:3]]
# draw the hull
cv2.line(img_draw, tuple(start), tuple(end), (0, 255, 0), 2)
  1. Then, we compute the angle between the two edges from far to start and from far to end. If the angle is smaller than thresh_deg degrees, it means that we are dealing with a defect that is most likely caused by two extended fingers. In such cases, we want to increment the number of detected fingers (num_fingers) and draw the point with green. Otherwise, we draw the point with red, as follows:
# if angle is below a threshold, defect point belongs to two
# extended fingers
if angle_rad(start - far, end - far) < deg2rad(thresh_deg):
# increment number of fingers
num_fingers += 1

# draw point as green
cv2.circle(img_draw, tuple(far), 5, (0, 255, 0), -1)
else:
# draw point as red
cv2.circle(img_draw, tuple(far), 5, (0, 0, 255), -1)
  1. After iterating over all convexity defects, we return the number of detected fingers and the assembled output image, like this:
return min(5, num_fingers), img_draw

Computing the minimum will make sure that we do not exceed the common number of fingers per hand.

The result can be seen in the following screenshot:

Interestingly, our app is able to detect the correct number of extended fingers in a variety of hand configurations. Defect points between extended fingers are easily classified as such by the algorithm, and others are successfully ignored.

Summary

This chapter showed a relatively simple—and yet surprisingly robust—way of recognizing a variety of hand gestures by counting the number of extended fingers.

The algorithm first shows how a task-relevant region of the image can be segmented using depth information acquired from a Microsoft Kinect 3D sensor, and how morphological operations can be used to clean up the segmentation result. By analyzing the shape of the segmented hand region, the algorithm comes up with a way to classify hand gestures based on the types of convexity effects found in the image.

Once again, mastering our use of OpenCV to perform the desired task did not require us to produce a large amount of code. Instead, we were challenged to gain an important insight that made us use the built-in functionality of OpenCV in an effective way.

Gesture recognition is a popular but challenging field in computer science, with applications in a large number of areas, such as Human-Computer Interaction (HCI), video surveillance, and even the video game industry. You can now use your advanced understanding of segmentation and structure analysis to build your own state-of-the-art gesture recognition system. Another approach you might want to use for hand gesture recognition is to train a deep image classification network on hand gestures. We will discuss deep networks for image classifications in Chapter 9, Learning to Classify and Localize Objects.

In the next chapter, we will continue to focus on detecting objects of interest in visual scenes, but we will assume a much more complicated case: viewing the object from an arbitrary perspective and distance. To do this, we will combine perspective transformations with scale-invariant feature descriptors to develop a robust feature-matching algorithm.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Understand how to capture high-quality image data, detect and track objects, and process the actions of animals or humans
  • Implement your learning in different areas of computer vision
  • Explore advanced concepts in OpenCV such as machine learning, artificial neural network, and augmented reality

Description

OpenCV is a native cross-platform C++ library for computer vision, machine learning, and image processing. It is increasingly being adopted in Python for development. This book will get you hands-on with a wide range of intermediate to advanced projects using the latest version of the framework and language, OpenCV 4 and Python 3.8, instead of only covering the core concepts of OpenCV in theoretical lessons. This updated second edition will guide you through working on independent hands-on projects that focus on essential OpenCV concepts such as image processing, object detection, image manipulation, object tracking, and 3D scene reconstruction, in addition to statistical learning and neural networks. You’ll begin with concepts such as image filters, Kinect depth sensor, and feature matching. As you advance, you’ll not only get hands-on with reconstructing and visualizing a scene in 3D but also learn to track visually salient objects. The book will help you further build on your skills by demonstrating how to recognize traffic signs and emotions on faces. Later, you’ll understand how to align images, and detect and track objects using neural networks. By the end of this OpenCV Python book, you’ll have gained hands-on experience and become proficient at developing advanced computer vision apps according to specific business needs.

Who is this book for?

This book is for intermediate-level OpenCV users who are looking to enhance their skills by developing advanced applications. Familiarity with OpenCV concepts and Python libraries, and basic knowledge of the Python programming language are assumed.

What you will learn

  • Generate real-time visual effects using filters and image manipulation techniques such as dodging and burning
  • Recognize hand gestures in real-time and perform hand-shape analysis based on the output of a Microsoft Kinect sensor
  • Learn feature extraction and feature matching to track arbitrary objects of interest
  • Reconstruct a 3D real-world scene using 2D camera motion and camera reprojection techniques
  • Detect faces using a cascade classifier and identify emotions in human faces using multilayer perceptrons
  • Classify, localize, and detect objects with deep neural networks

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Mar 20, 2020
Length: 366 pages
Edition : 2nd
Language : English
ISBN-13 : 9781789617634
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Mar 20, 2020
Length: 366 pages
Edition : 2nd
Language : English
ISBN-13 : 9781789617634
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 112.97
Python Image Processing Cookbook
€37.99
OpenCV 4 with Python Blueprints
€36.99
Learning OpenCV 4 Computer Vision with Python 3
€37.99
Total 112.97 Stars icon
Banner background image

Table of Contents

13 Chapters
Fun with Filters Chevron down icon Chevron up icon
Hand Gesture Recognition Using a Kinect Depth Sensor Chevron down icon Chevron up icon
Finding Objects via Feature Matching and Perspective Transforms Chevron down icon Chevron up icon
3D Scene Reconstruction Using Structure from Motion Chevron down icon Chevron up icon
Using Computational Photography with OpenCV Chevron down icon Chevron up icon
Tracking Visually Salient Objects Chevron down icon Chevron up icon
Learning to Recognize Traffic Signs Chevron down icon Chevron up icon
Learning to Recognize Facial Emotions Chevron down icon Chevron up icon
Learning to Classify and Localize Objects Chevron down icon Chevron up icon
Learning to Detect and Track Objects Chevron down icon Chevron up icon
Profiling and Accelerating Your Apps Chevron down icon Chevron up icon
Setting Up a Docker Container Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(4 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
David Buniatyan Nov 20, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book goes through application-specific examples and brings advanced concepts within their context. Ideal for practitioners who are interested in quickly seeing results and learning computer vision techniques as they build.
Amazon Verified review Amazon
David Oct 01, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I really enjoyed the book. As a Machine Learning R&D engineer with about 5 years of experience I think it is well written and best suited for anyone who interested in computer vision, but mostly for the ones who are looking to build their own app. Let me go through the points I liked:- Each chapter is built around an independent application, such as Finding an Object or Facial Emotions detection. This opens a reader the possibility to start right off from the problem they have in mind.- The book describes neatly and with a good visual support important ML algorithms such as SVM, backpropagation, CNNs- The book contains guides and instructions not only from the algorithmic side, but from the developer's as well. Thus setting up an environment will not cause any problems.
Amazon Verified review Amazon
Matthew Emerick Apr 17, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Disclaimer: The publisher asked me to review this book and gave me a review copy. I promise to be 100% honest in how I feel about this book, both the good and the less so.Overview:This book is for the intermediate or advanced Python programmer who is interested in computer vision with OpenCV. It's a computer vision book that lets you build projects to gain hands on experience in the field as well as a portfolio to show off to get hired. Outside of the libraries the book lists, you will need a depth sensor for the project in Chapter 2. If you can't or won't buy one, you can still do the remaining projects.What I Like:The very first chapter gives a good introduction to using OpenCV with some traditional computer vision work. This gives the reader some confidence when moving forward. The rest of the book flows from intermediate to advanced quite well with increasingly more difficult programs. Each chapter's project focuses on a different ability of the OpenCV library.What I Didn't LikeThere isn't much to dislike about this book. I do wish that there were more projects, I'm a bit of a sucker for that. I also wish that there were suggestions as how to further extend the code or additional code exercises. That would have been great for the programmers transitioning from students with textbooks to software developers.What I Would Like to SeeI touched upon this in the last section, but the book could have been longer. Or it could have been broken up into two books of the same length, one for intermediate programmer and the other for advanced. Thankfully, there are other books to fill in the space, as well as plenty to find online.Overall, this is a great book. I give it an easy 5 stars out of 5. It does exactly what is says it is going to and fills a gap that I had been trying to fill. Well done!
Amazon Verified review Amazon
SJ Sep 07, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The authors did a good job of explaining various applications that can be build using OpenCV. OpenCV is a widely used Vision-based library but finding a proper guide that can explain all applications is hard to find. If you are looking for a place for a good start to learn traditional vision and write them as code, this is a book to go.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.