OpenCV 4 with Python Blueprints

Hand Gesture Recognition Using a Kinect Depth Sensor

The goal of this chapter is to develop an app that detects and tracks simple hand gestures in real time, using the output of a depth sensor, such as that of a Microsoft Kinect 3D sensor or an ASUS Xtion sensor. The app will analyze each captured frame to perform the following tasks:

Hand region segmentation: The user's hand region will be extracted in each frame by analyzing the depth map output of the Kinect sensor, which is done by thresholding, applying some morphological operations, and finding connected components.
Hand shape analysis: The shape of the segmented hand region will be analyzed by determining contours, convex hull, and convexity defects.
Hand gesture recognition: The number of extended fingers will be determined based on the hand contour's convexity defects, and the gesture will be classified accordingly (with no extended fingers corresponding to a fist, and five extended fingers corresponding to an open hand).

Gesture recognition is an ever-popular topic in computer science. This is because it not only enables humans to communicate with machines (Human-Machine Interaction (HMI)) but also constitutes the first step for machines to begin understanding human body language. With affordable sensors such as Microsoft Kinect or Asus Xtion and open source software such as OpenKinect and OpenNI, it has never been easier to get started in the field yourself. So, what shall we do with all this technology?

In this chapter, we will cover the following topics:

Planning the app
Setting up the app
Tracking hand gestures in real time
Understanding hand region segmentation
Performing hand shape analysis
Performing hand gesture recognition

The beauty of the algorithm that we are going to implement in this chapter is that it works well for many hand gestures, yet it is simple enough to run in real time on a generic laptop. Also, if we want, we can easily extend it to incorporate more complicated hand-pose estimations.

Once you complete the app, you will understand how to use depth sensors in your own apps. You will learn how to compose shapes of interest with OpenCV from the depth information, as well as understanding how to analyze shapes with OpenCV, using their geometric properties.

Setting up the app

Before we can get down to the nitty-gritty of our gesture recognition algorithm, we need to make sure that we can access the depth sensor and display a stream of depth frames. In this section, we will cover the following things that will help us set up the app:

Accessing the Kinect 3D sensor
Utilizing OpenNI-compatible sensors
Running the app and main function routine

First, we will look at how to use the Kinect 3D sensor.

Accessing the Kinect 3D sensor

The easiest way to access a Kinect sensor is by using an OpenKinect module called freenect. For installation instructions, take a look at the preceding section.

The freenect module has functions such as sync_get_depth() and sync_get_video(), used to obtain images synchronously from the depth sensor and camera sensor respectively. For this chapter, we will need only the Kinect depth map, which is a single-channel (grayscale) image in which each pixel value is the estimated distance from the camera to a particular surface in the visual scene.

Here, we will design a function that will read a frame from the sensor and convert it to the desired format, and return the frame together with a success status, as follows:

def read_frame(): -> Tuple[bool,np.ndarray]:

The function consists of the following steps:

Grab a frame; terminate the function if a frame was not acquired, like this:

    frame, timestamp = freenect.sync_get_depth() 
    if frame is None:
        return False, None

The sync_get_depth method returns both the depth map and a timestamp. By default, the map is in an 11-bit format. The last 10 bits of the sensor describes the depth, while the first bit states that the distance estimation was not successful when it's equal to 1.

It is a good idea to standardize the data into an 8-bit precision format, as an 11-bit format is inappropriate to be visualized with cv2.imshow right away, as well as in the future. We might want to use some different sensor that returns in a different format, as follows:

np.clip(depth, 0, 2**10-1, depth) 
depth >>= 2

In the previous code, we have first clipped the values to 1,023 (or 2**10-1) to fit in 10 bits. Such clipping results in the assignment of the undetected distance to the farthest possible point. Next, we shift 2 bits to the right to fit the distance in 8 bits.

Finally, we convert the image into an 8-bit unsigned integer array and return the result, as follows:

return True, depth.astype(np.uint8)

Now, the depth image can be visualized as follows:

cv2.imshow("depth", read_frame()[1])

Let's see how to use OpenNI-compatible sensors in the next section.

Utilizing OpenNI-compatible sensors

To use an OpenNI-compatible sensor, you must first make sure that OpenNI2 is installed and that your version of OpenCV was built with the support of OpenNI. The build information can be printed as follows:

import cv2
print(cv2.getBuildInformation())

If your version was built with OpenNI support, you will find it under the Video I/O section. Otherwise, you will have to rebuild OpenCV with OpenNI support, which is done by passing the -D WITH_OPENNI2=ON flag to cmake.

After the installation process is complete, you can access the sensor similarly to other video input devices, using cv2.VideoCapture. In this app, in order to use an OpenNI-compatible sensor instead of a Kinect 3D sensor, you have to cover the following steps:

Create a video capture that connects to your OpenNI-compatible sensor, like this:

device = cv2.cv.CV_CAP_OPENNI 
capture = cv2.VideoCapture(device)

If you want to connect to Asus Xtion, the device variable should be assigned to the cv2.CV_CAP_OPENNI_ASUS value instead.

Change the input frame size to the standard Video Graphics Array (VGA) resolution, as follows:

capture.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, 640) 
capture.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, 480)

In the previous section, we designed the read_frame function, which accesses the Kinect sensor using freenect. In order to read depth images from the video capture, you have to change that function to the following one:

def read_frame():
    if not capture.grab():
        return False,None
    return capture.retrieve(cv2.CAP_OPENNI_DEPTH_MAP)

You will note that we have used the grab and retrieve methods instead of the read method. The reason is that the read method of cv2.VideoCapture is inappropriate when we need to synchronize a set of cameras or a multi-head camera, such as a Kinect.

For such cases, you grab frames from multiple sensors at a certain moment in time with the grab method and then retrieve the data of the sensors of interest with the retrieve method. For example, in your own apps, you might also need to retrieve a BGR frame (standard camera frame), which can be done by passing cv2.CAP_OPENNI_BGR_IMAGE to the retrieve method.

So, now that you can read data from your sensor, let's see how to run the application in the next section.

Running the app and main function routine

The chapter2.py script is responsible for running the app, and it first imports the following modules:

import cv2
import numpy as np
from gestures import recognize
from frame_reader import read_frame

The recognize function is responsible for recognizing a hand gesture, and we will compose it later in this chapter. We have also placed the read_frame method that we composed in the previous section in a separate script, for convenience.

In order to simplify the segmentation task, we will instruct the user to place their hand in the center of the screen. To provide a visual aid for this, we create the following function:

def draw_helpers(img_draw: np.ndarray) -> None:
    # draw some helpers for correctly placing hand
    height, width = img_draw.shape[:2]
    color = (0,102,255)
    cv2.circle(img_draw, (width // 2, height // 2), 3, color, 2)
    cv2.rectangle(img_draw, (width // 3, height // 3),
                  (width * 2 // 3, height * 2 // 3), color, 2)

The function draws a rectangle around the image center and highlights the center pixel of the image in orange.

All the heavy lifting is done by the main function, shown in the following code block:

def main():
    for _, frame in iter(read_frame, (False, None)):

The function iterates over grayscale frames from Kinect, and, in each iteration, it covers the following steps:

Recognize hand gestures using the recognize function, which returns the estimated number of extended fingers (num_fingers) and an annotated BGR color image, as follows:

num_fingers, img_draw = recognize(frame)

Call the draw_helpers function on the annotated BGR image in order to provide a visual aid for hand placement, as follows:

 draw_helpers(img_draw)

Finally, the main function draws the number of fingers on the annotated frame, displays results with cv2.imshow, and sets termination criteria, as follows:

        # print number of fingers on image
        cv2.putText(img_draw, str(num_fingers), (30, 30),
                    cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255))
        cv2.imshow("frame", img_draw)
        # Exit on escape
        if cv2.waitKey(10) == 27:
            break

So, now that we have the main script, you will note that the only function that we are missing is the recognize function. In order to track hand gestures, we need to compose this function, which we will do in the next section.

Understanding hand region segmentation

The automatic detection of an arm—and later, the hand region—could be designed to be arbitrarily complicated, maybe by combining information about the shape and color of an arm or hand. However, using skin color as a determining feature to find hands in visual scenes might fail terribly in poor lighting conditions or when the user is wearing gloves. Instead, we choose to recognize the user's hand by its shape in the depth map.

Allowing hands of all sorts to be present in any region of the image unnecessarily complicates the mission of the present chapter, so we make two simplifying assumptions:

We will instruct the user of our app to place their hand in front of the center of the screen, orienting their palm roughly parallel to the orientation of the Kinect sensor so that it is easier to identify the corresponding depth layer of the hand.
We will also instruct the user to sit roughly 1 to 2 meters away from the Kinect and to slightly extend their arm in front of their body so that the hand will end up in a slightly different depth layer than the arm. However, the algorithm will still work even if the full arm is visible.

In this way, it will be relatively straightforward to segment the image based on the depth layer alone. Otherwise, we would have to come up with a hand detection algorithm first, which would unnecessarily complicate our mission. If you feel adventurous, feel free to do this on your own.

Let's see how to find the most prominent depth of the image center region in the next section.

Finding the most prominent depth of the image center region

Once the hand is placed roughly in the center of the screen, we can start finding all image pixels that lie on the same depth plane as the hand. This is done by following these steps:

First, we simply need to determine the most prominent depth value of the center region of the image. The simplest approach would be to look only at the depth value of the center pixel, like this:

width, height = depth.shape 
center_pixel_depth = depth[width/2, height/2]

Then, create a mask in which all pixels at a depth of center_pixel_depth are white and all others are black, as follows:

import numpy as np 
 
depth_mask = np.where(depth == center_pixel_depth, 255, 
     0).astype(np.uint8)

However, this approach will not be very robust, because there is the chance that it will be compromised by the following:

Your hand will not be placed perfectly parallel to the Kinect sensor.
Your hand will not be perfectly flat.
The Kinect sensor values will be noisy.

Therefore, different regions of your hand will have slightly different depth values.

The segment_arm method takes a slightly better approach—it looks at a small neighborhood in the center of the image and determines the median depth value. This is done by following these steps:

First, we find the center region (for example, 21 x 21 pixels) of the image frame, like this:

def segment_arm(frame: np.ndarray, abs_depth_dev: int = 14) -> np.ndarray:
    height, width = frame.shape
    # find center (21x21 pixels) region of imageheight frame
    center_half = 10 # half-width of 21 is 21/2-1
    center = frame[height // 2 - center_half:height // 2 + center_half,
                   width // 2 - center_half:width // 2 + center_half]

Then, we determine the median depth value, med_val, as follows:

med_val = np.median(center)

We can now compare med_val with the depth value of all pixels in the image and create a mask in which all pixels whose depth values are within a particular range [med_val-abs_depth_dev, med_val+abs_depth_dev] are white, and all other pixels are black.

However, for reasons that will become clear in a moment, let's paint the pixels gray instead of white, like this:

frame = np.where(abs(frame - med_val) <= abs_depth_dev,
                 128, 0).astype(np.uint8)

The result will look like this:

You will note that the segmentation mask is not smooth. In particular, it contains holes at points where the depth sensor failed to make a prediction. Let's learn how to apply morphological closing to smoothen the segmentation mask, in the next section.

Applying morphological closing for smoothening

A common problem with segmentation is that a hard threshold typically results in small imperfections (that is, holes, as in the preceding image) in the segmented region. These holes can be alleviated by using morphological opening and closing. When it is opened, it removes small objects from the foreground (assuming that the objects are bright on a dark foreground), whereas closing removes small holes (dark regions).

This means that we can get rid of the small black regions in our mask by applying morphological closing (dilation followed by erosion) with a small 3 x 3-pixel kernel, as follows:

kernel = np.ones((3, 3), np.uint8)
frame = cv2.morphologyEx(frame, cv2.MORPH_CLOSE, kernel)

The result looks a lot smoother, as follows:

Notice, however, that the mask still contains regions that do not belong to the hand or arm, such as what appears to be one of the knees on the left and some furniture on the right. These objects just happen to be on the same depth layer of my arm and hand. If possible, we could now combine the depth information with another descriptor, maybe a texture- or skeleton-based hand classifier that would weed out all non-skin regions.

An easier approach is to realize that most of the time, hands are not connected to knees or furniture. Let's learn how to find connected components in a segmentation mask.

Finding connected components in a segmentation mask

We already know that the center region belongs to the hand. For such a scenario, we can simply apply cv2.floodfill to find all the connected image regions.

Before we do this, we want to be absolutely certain that the seed point for the flood fill belongs to the right mask region. This can be achieved by assigning a grayscale value of 128 to the seed point. However, we also want to make sure that the center pixel does not, by any coincidence, lie within a cavity that the morphological operation failed to close.

So, let's set a small 7 x 7-pixel region with a grayscale value of 128 instead, like this:

small_kernel = 3
frame[height // 2 - small_kernel:height // 2 + small_kernel,
      width // 2 - small_kernel:width // 2 + small_kernel] = 128

As flood filling (as well as morphological operations) is potentially dangerous, OpenCV requires the specification of a mask that avoids flooding the entire image. This mask has to be 2 pixels wider and taller than the original image and has to be used in combination with the cv2.FLOODFILL_MASK_ONLY flag.

It can be very helpful to constrain the flood filling to a small region of the image or a specific contour so that we need not connect two neighboring regions that should never have been connected in the first place. It's better to be safe than sorry, right?

Nevertheless, today, we feel courageous! Let's make the mask entirely black, like this:

mask = np.zeros((height + 2, width + 2), np.uint8)

Then, we can apply the flood fill to the center pixel (the seed point), and paint all the connected regions white, as follows:

flood = frame.copy()
cv2.floodFill(flood, mask, (width // 2, height // 2), 255,
              flags=4 | (255 << 8))

At this point, it should be clear why we decided to start with a gray mask earlier. We now have a mask that contains white regions (arm and hand), gray regions (neither arm nor hand, but other things in the same depth plane), and black regions (all others). With this setup, it is easy to apply a simple binary threshold to highlight only the relevant regions of the pre-segmented depth plane, as follows:

ret, flooded = cv2.threshold(flood, 129, 255, cv2.THRESH_BINARY)

This is what the resulting mask looks like:

The resulting segmentation mask can now be returned to the recognize function, where it will be used as an input to the find_hull_defects function, as well as a canvas for drawing the final output image (img_draw). The function analyzes the shape of a hand in order to detect the defects of a hull that corresponds to the hand. Let's learn how to perform hand shape analysis in the next section.

Performing hand shape analysis

Now that we know (roughly) where the hand is located, we aim to learn something about its shape. In this app, we will make a decision on which exact gesture is shown, based on convexity defects of a contour corresponding to the hand.

Let's move on and learn how to determine the contour of the segmented hand region in the next section, which will be the first step in our hand shape analysis.

Determining the contour of the segmented hand region

The first step involves determining the contour of the segmented hand region. Luckily, OpenCV comes with a pre-canned version of such an algorithm—cv2.findContours. This function acts on a binary image and returns a set of points that are believed to be part of the contour. As there might be multiple contours present in the image, it is possible to retrieve an entire hierarchy of contours, as follows:

def find_hull_defects(segment: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
    contours, hierarchy = cv2.findContours(segment, cv2.RETR_TREE,
                                           cv2.CHAIN_APPROX_SIMPLE)

Furthermore, because we do not know which contour we are looking for, we have to make an assumption to clean up the contour result, since it is possible that some small cavities are left over even after the morphological closing. However, we are fairly certain that our mask contains only the segmented area of interest. We will assume that the largest contour found is the one that we are looking for.

Thus, we simply traverse the list of contours, calculate the contour area (cv2.contourArea), and store only the largest one (max_contour), like this:

max_contour = max(contours, key=cv2.contourArea)

The contour that we found might still have too many corners. We approximate the contour with a similar contour that does not have sides that are less than 1 percent of the perimeter of the contour, like this:

epsilon = 0.01 * cv2.arcLength(max_contour, True)
max_contour = cv2.approxPolyDP(max_contour, epsilon, True)

Let's learn how to find the convex hull of a contour area, in the next section.

Finding the convex hull of a contour area

Once we have identified the largest contour in our mask, it is straightforward to compute the convex hull of the contour area. The convex hull is basically the envelope of the contour area. If you think of all the pixels that belong to the contour area as a set of nails poking out of a board, then a tight rubber band encircles all the nails forming the convex hull shape. We can get the convex hull directly from our largest contour (max_contour), like this:

hull = cv2.convexHull(max_contour, returnPoints=False)

As we now want to look at convexity deficits in this hull, we are instructed by the OpenCV documentation to set the returnPoints optional flag to False.

The convex hull drawn in yellow around a segmented hand region looks like this:

As mentioned previously, we will determine a hand gesture based on convexity defects. Let's move on and learn how to find the convexity defects of a convex hull in the next section, which will bring us one step closer to recognizing hand gestures.

Finding the convexity defects of a convex hull

As is evident from the preceding screenshot, not all points on the convex hull belong to the segmented hand region. In fact, all the fingers and the wrist cause severe convexity defects—that is, points of the contour that are far away from the hull.

We can find these defects by looking at both the largest contour (max_contour) and the corresponding convex hull (hull), as follows:

defects = cv2.convexityDefects(max_contour, hull)

The output of this function (defects) is a NumPy array containing all defects. Each defect is an array of four integers that are start_index (index of the point in the contour where the defect begins), end_index (index of the point in the contour where the defect ends), farthest_pt_index (the index of the farthest point from the convex hull within the defect), and fixpt_depth (the distance between the farthest point and the convex hull).

We will make use of this information in just a moment when we try to estimate the number of extended fingers.

For now, though, our job is done. The extracted contour (max_contour) and convexity defects (defects) can be returned to recognize, where they will be used as inputs to detect_num_fingers, as follows:

return max_contour, defects

So, now that we have found the defects, let's move on and learn how to perform hand gesture recognition using the convexity defects, which will bring us toward the completion of the app.

Performing hand gesture recognition

What remains to be done is to classify the hand gesture based on the number of extended fingers. For example, if we find five extended fingers, we assume the hand to be open, whereas no extended fingers implies a fist. All that we are trying to do is count from zero to five, and make the app recognize the corresponding number of fingers.

This is actually trickier than it might seem at first. For example, people in Europe might count to three by extending their thumb, index finger, and middle finger. If you do that in the US, people there might get horrendously confused, because they do not tend to use their thumbs when signaling the number two.

This might lead to frustration, especially in restaurants (trust me). If we could find a way to generalize these two scenarios—maybe by appropriately counting the number of extended fingers, we would have an algorithm that could teach simple hand gesture recognition to not only a machine but also (maybe) to a person of average intellect.

As you might have guessed, the answer is related to convexity defects. As mentioned earlier, extended fingers cause defects in the convex hull. However, the inverse is not true; that is, not all convexity defects are caused by fingers! There might be additional defects caused by the wrist, as well as the overall orientation of the hand or the arm. How can we distinguish between these different causes of defects?

Let's distinguish between different cases of convexity defects, in the next section.

Distinguishing between different causes of convexity defects

The trick is to look at the angle between the farthest point from the convex hull point within the defect (farthest_pt_index) and the start and endpoints of the defect (start_index and end_index, respectively), as illustrated in the following screenshot:

In the previous screenshot, the orange markers serve as a visual aid to center the hand in the middle of the screen, and the convex hull is outlined in green. Each red dot corresponds to the point farthest from the convex hull (farthest_pt_index) for every convexity defect detected. If we compare a typical angle that belongs to two extended fingers (such as θj) to an angle that is caused by general hand geometry (such as θi), we notice that the former is much smaller than the latter.

This is obviously because humans can spread their fingers only a little, thus creating a narrow angle made by the farthest defect point and the neighboring fingertips. Therefore, we can iterate over all convexity defects and compute the angle between the said points. For this, we will need a utility function that calculates the angle (in radians) between two arbitrary—a list like vectors, v1 and v2, as follows:

def angle_rad(v1, v2): 
    return np.arctan2(np.linalg.norm(np.cross(v1, v2)), 
         np.dot(v1, v2))

This method uses the cross product to compute the angle, rather than doing it in the standard way. The standard way of calculating the angle between two vectors v1 and v2 is by calculating their dot product and dividing it by the norm of v1 and the norm of v2. However, this method has two imperfections:

You have to manually avoid division by zero if either the norm of v1 or the norm of v2 is zero.
The method returns relatively inaccurate results for small angles.

Similarly, we provide a simple function to convert an angle from degrees to radians, illustrated here:

def deg2rad(angle_deg): 
    return angle_deg/180.0*np.pi

In the next section, we'll see how to classify hand gestures based on the number of extended fingers.

Classifying hand gestures based on the number of extended fingers

What remains to be done is to actually classify the hand gesture based on the number of instances of extended fingers. The classification is done using the following function:

def detect_num_fingers(contour: np.ndarray, defects: np.ndarray,
                       img_draw: np.ndarray, thresh_deg: float = 80.0) -> Tuple[int, np.ndarray]:

The function accepts the detected contour (contour), the convexity defects (defects), a canvas to draw on (img_draw), and a cutoff angle that can be used as a threshold to classify whether convexity defects are caused by extended fingers or not (thresh_deg).

Except for the angle between the thumb and the index finger, it is rather hard to get anything close to 90 degrees, so anything close to that number should work. We do not want the cutoff angle to be too high, because that might lead to errors in classifications. The complete function will return the number of fingers and the illustration frame, and consists of the following steps:

First, let's focus on special cases. If we do not find any convexity defects, it means that we possibly made a mistake during the convex hull calculation, or there are simply no extended fingers in the frame, so we return 0 as the number of detected fingers, as follows:

if defects is None: 
    return [0, img_draw]

However, we can take this idea even further. Due to the fact that arms are usually slimmer than hands or fists, we can assume that the hand geometry will always generate at least two convexity defects (which usually belong to the wrists). So, if there are no additional defects, it implies that there are no extended fingers:

if len(defects) <= 2: 
    return [0, img_draw]

Now that we have ruled out all special cases, we can begin counting real fingers. If there is a sufficient number of defects, we will find a defect between every pair of fingers. Thus, in order to get the number right (num_fingers), we should start counting at 1, like this:

num_fingers = 1

Then, we start iterating over all convexity defects. For each defect, we extract the three points and draw its hull for visualization purposes, as follows:

# Defects are of shape (num_defects,1,4)
for defect in defects[:, 0, :]:
    # Each defect is an array of four integers.
    # First three indexes of start, end and the furthest
    # points respectively
    start, end, far = [contour[i][0] for i in defect[:3]]
    # draw the hull
    cv2.line(img_draw, tuple(start), tuple(end), (0, 255, 0), 2)

Then, we compute the angle between the two edges from far to start and from far to end. If the angle is smaller than thresh_deg degrees, it means that we are dealing with a defect that is most likely caused by two extended fingers. In such cases, we want to increment the number of detected fingers (num_fingers) and draw the point with green. Otherwise, we draw the point with red, as follows:

# if angle is below a threshold, defect point belongs to two
# extended fingers
if angle_rad(start - far, end - far) < deg2rad(thresh_deg):
    # increment number of fingers
    num_fingers += 1

    # draw point as green
    cv2.circle(img_draw, tuple(far), 5, (0, 255, 0), -1)
else:
    # draw point as red
    cv2.circle(img_draw, tuple(far), 5, (0, 0, 255), -1)

After iterating over all convexity defects, we return the number of detected fingers and the assembled output image, like this:

return min(5, num_fingers), img_draw

Computing the minimum will make sure that we do not exceed the common number of fingers per hand.

The result can be seen in the following screenshot:

Interestingly, our app is able to detect the correct number of extended fingers in a variety of hand configurations. Defect points between extended fingers are easily classified as such by the algorithm, and others are successfully ignored.

David Buniatyan Nov 20, 2020

The book goes through application-specific examples and brings advanced concepts within their context. Ideal for practitioners who are interested in quickly seeing results and learning computer vision techniques as they build.

Amazon Verified review

David Oct 01, 2020

I really enjoyed the book. As a Machine Learning R&D engineer with about 5 years of experience I think it is well written and best suited for anyone who interested in computer vision, but mostly for the ones who are looking to build their own app. Let me go through the points I liked:- Each chapter is built around an independent application, such as Finding an Object or Facial Emotions detection. This opens a reader the possibility to start right off from the problem they have in mind.- The book describes neatly and with a good visual support important ML algorithms such as SVM, backpropagation, CNNs- The book contains guides and instructions not only from the algorithmic side, but from the developer's as well. Thus setting up an environment will not cause any problems.

Matthew Emerick Apr 17, 2020

Disclaimer: The publisher asked me to review this book and gave me a review copy. I promise to be 100% honest in how I feel about this book, both the good and the less so.Overview:This book is for the intermediate or advanced Python programmer who is interested in computer vision with OpenCV. It's a computer vision book that lets you build projects to gain hands on experience in the field as well as a portfolio to show off to get hired. Outside of the libraries the book lists, you will need a depth sensor for the project in Chapter 2. If you can't or won't buy one, you can still do the remaining projects.What I Like:The very first chapter gives a good introduction to using OpenCV with some traditional computer vision work. This gives the reader some confidence when moving forward. The rest of the book flows from intermediate to advanced quite well with increasingly more difficult programs. Each chapter's project focuses on a different ability of the OpenCV library.What I Didn't LikeThere isn't much to dislike about this book. I do wish that there were more projects, I'm a bit of a sucker for that. I also wish that there were suggestions as how to further extend the code or additional code exercises. That would have been great for the programmers transitioning from students with textbooks to software developers.What I Would Like to SeeI touched upon this in the last section, but the book could have been longer. Or it could have been broken up into two books of the same length, one for intermediate programmer and the other for advanced. Thankfully, there are other books to fill in the space, as well as plenty to find online.Overall, this is a great book. I give it an easy 5 stars out of 5. It does exactly what is says it is going to and fills a gap that I had been trying to fill. Well done!

SJ Sep 07, 2020

The authors did a good job of explaining various applications that can be build using OpenCV. OpenCV is a widely used Vision-based library but finding a proper guide that can explain all applications is hard to find. If you are looking for a place for a good start to learn traditional vision and write them as code, this is a book to go.

OpenCV 4 with Python Blueprints: Build creative computer vision projects with the latest version of OpenCV 4 and Python 3 , Second Edition

What do you get with Print?

OpenCV 4 with Python Blueprints

Hand Gesture Recognition Using a Kinect Depth Sensor

Getting started

Planning the app

Setting up the app

Accessing the Kinect 3D sensor

Utilizing OpenNI-compatible sensors

Running the app and main function routine

Tracking hand gestures in real time

Understanding hand region segmentation

Finding the most prominent depth of the image center region

Applying morphological closing for smoothening

Finding connected components in a segmentation mask

Performing hand shape analysis

Determining the contour of the segmented hand region

Finding the convex hull of a contour area

Finding the convexity defects of a convex hull

Performing hand gesture recognition

Distinguishing between different causes of convexity defects

Classifying hand gestures based on the number of extended fingers

Summary

Page 1 of 9

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the 4 authors

FAQs

OpenCV 4 with Python Blueprints: Build creative computer vision projects with the latest version of OpenCV 4 and Python 3 , Second Edition

What do you get with Print?

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the 4 authors

FAQs