You're reading from OpenCV 4 with Python Blueprints Build creative computer vision projects with the latest version of OpenCV 4 and Python 3

Product type Paperback

Published in Mar 2020

Publisher Packt

ISBN-13 9781789801811

Length 366 pages

Edition 2nd Edition

Languages

Python

Tools

OpenCV

Concepts

Computer Vision

Authors (4):

Dr. Menua Gevorgyan

Michael Beyeler (USD)

Mamikonyan

Michael Beyeler

View More author details

Understanding hand region segmentation

The automatic detection of an arm—and later, the hand region—could be designed to be arbitrarily complicated, maybe by combining information about the shape and color of an arm or hand. However, using skin color as a determining feature to find hands in visual scenes might fail terribly in poor lighting conditions or when the user is wearing gloves. Instead, we choose to recognize the user's hand by its shape in the depth map.

Allowing hands of all sorts to be present in any region of the image unnecessarily complicates the mission of the present chapter, so we make two simplifying assumptions:

We will instruct the user of our app to place their hand in front of the center of the screen, orienting their palm roughly parallel to the orientation of the Kinect sensor so that it is easier to identify the corresponding depth layer of the hand.
We will also instruct the user to sit roughly 1 to 2 meters away from the Kinect and to slightly extend their arm in front of their body so that the hand will end up in a slightly different depth layer than the arm. However, the algorithm will still work even if the full arm is visible.

In this way, it will be relatively straightforward to segment the image based on the depth layer alone. Otherwise, we would have to come up with a hand detection algorithm first, which would unnecessarily complicate our mission. If you feel adventurous, feel free to do this on your own.

Let's see how to find the most prominent depth of the image center region in the next section.

Finding the most prominent depth of the image center region

Once the hand is placed roughly in the center of the screen, we can start finding all image pixels that lie on the same depth plane as the hand. This is done by following these steps:

First, we simply need to determine the most prominent depth value of the center region of the image. The simplest approach would be to look only at the depth value of the center pixel, like this:

width, height = depth.shape 
center_pixel_depth = depth[width/2, height/2]

Then, create a mask in which all pixels at a depth of center_pixel_depth are white and all others are black, as follows:

import numpy as np 
 
depth_mask = np.where(depth == center_pixel_depth, 255, 
     0).astype(np.uint8)

However, this approach will not be very robust, because there is the chance that it will be compromised by the following:

Your hand will not be placed perfectly parallel to the Kinect sensor.
Your hand will not be perfectly flat.
The Kinect sensor values will be noisy.

Therefore, different regions of your hand will have slightly different depth values.

The segment_arm method takes a slightly better approach—it looks at a small neighborhood in the center of the image and determines the median depth value. This is done by following these steps:

First, we find the center region (for example, 21 x 21 pixels) of the image frame, like this:

def segment_arm(frame: np.ndarray, abs_depth_dev: int = 14) -> np.ndarray:
    height, width = frame.shape
    # find center (21x21 pixels) region of imageheight frame
    center_half = 10 # half-width of 21 is 21/2-1
    center = frame[height // 2 - center_half:height // 2 + center_half,
                   width // 2 - center_half:width // 2 + center_half]

Then, we determine the median depth value, med_val, as follows:

med_val = np.median(center)

We can now compare med_val with the depth value of all pixels in the image and create a mask in which all pixels whose depth values are within a particular range [med_val-abs_depth_dev, med_val+abs_depth_dev] are white, and all other pixels are black.

However, for reasons that will become clear in a moment, let's paint the pixels gray instead of white, like this:

frame = np.where(abs(frame - med_val) <= abs_depth_dev,
                 128, 0).astype(np.uint8)

The result will look like this:

You will note that the segmentation mask is not smooth. In particular, it contains holes at points where the depth sensor failed to make a prediction. Let's learn how to apply morphological closing to smoothen the segmentation mask, in the next section.

Applying morphological closing for smoothening

A common problem with segmentation is that a hard threshold typically results in small imperfections (that is, holes, as in the preceding image) in the segmented region. These holes can be alleviated by using morphological opening and closing. When it is opened, it removes small objects from the foreground (assuming that the objects are bright on a dark foreground), whereas closing removes small holes (dark regions).

This means that we can get rid of the small black regions in our mask by applying morphological closing (dilation followed by erosion) with a small 3 x 3-pixel kernel, as follows:

kernel = np.ones((3, 3), np.uint8)
frame = cv2.morphologyEx(frame, cv2.MORPH_CLOSE, kernel)

The result looks a lot smoother, as follows:

Notice, however, that the mask still contains regions that do not belong to the hand or arm, such as what appears to be one of the knees on the left and some furniture on the right. These objects just happen to be on the same depth layer of my arm and hand. If possible, we could now combine the depth information with another descriptor, maybe a texture- or skeleton-based hand classifier that would weed out all non-skin regions.

An easier approach is to realize that most of the time, hands are not connected to knees or furniture. Let's learn how to find connected components in a segmentation mask.

Finding connected components in a segmentation mask

We already know that the center region belongs to the hand. For such a scenario, we can simply apply cv2.floodfill to find all the connected image regions.

Before we do this, we want to be absolutely certain that the seed point for the flood fill belongs to the right mask region. This can be achieved by assigning a grayscale value of 128 to the seed point. However, we also want to make sure that the center pixel does not, by any coincidence, lie within a cavity that the morphological operation failed to close.

So, let's set a small 7 x 7-pixel region with a grayscale value of 128 instead, like this:

small_kernel = 3
frame[height // 2 - small_kernel:height // 2 + small_kernel,
      width // 2 - small_kernel:width // 2 + small_kernel] = 128

As flood filling (as well as morphological operations) is potentially dangerous, OpenCV requires the specification of a mask that avoids flooding the entire image. This mask has to be 2 pixels wider and taller than the original image and has to be used in combination with the cv2.FLOODFILL_MASK_ONLY flag.

It can be very helpful to constrain the flood filling to a small region of the image or a specific contour so that we need not connect two neighboring regions that should never have been connected in the first place. It's better to be safe than sorry, right?

Nevertheless, today, we feel courageous! Let's make the mask entirely black, like this:

mask = np.zeros((height + 2, width + 2), np.uint8)

Then, we can apply the flood fill to the center pixel (the seed point), and paint all the connected regions white, as follows:

flood = frame.copy()
cv2.floodFill(flood, mask, (width // 2, height // 2), 255,
              flags=4 | (255 << 8))

At this point, it should be clear why we decided to start with a gray mask earlier. We now have a mask that contains white regions (arm and hand), gray regions (neither arm nor hand, but other things in the same depth plane), and black regions (all others). With this setup, it is easy to apply a simple binary threshold to highlight only the relevant regions of the pre-segmented depth plane, as follows:

ret, flooded = cv2.threshold(flood, 129, 255, cv2.THRESH_BINARY)

This is what the resulting mask looks like:

The resulting segmentation mask can now be returned to the recognize function, where it will be used as an input to the find_hull_defects function, as well as a canvas for drawing the final output image (img_draw). The function analyzes the shape of a hand in order to detect the defects of a hull that corresponds to the hand. Let's learn how to perform hand shape analysis in the next section.