Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
OpenCV 4 with Python Blueprints

You're reading from   OpenCV 4 with Python Blueprints Build creative computer vision projects with the latest version of OpenCV 4 and Python 3

Arrow left icon
Product type Paperback
Published in Mar 2020
Publisher Packt
ISBN-13 9781789801811
Length 366 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Authors (4):
Arrow left icon
Michael Beyeler (USD) Michael Beyeler (USD)
Author Profile Icon Michael Beyeler (USD)
Michael Beyeler (USD)
Dr. Menua Gevorgyan Dr. Menua Gevorgyan
Author Profile Icon Dr. Menua Gevorgyan
Dr. Menua Gevorgyan
Michael Beyeler Michael Beyeler
Author Profile Icon Michael Beyeler
Michael Beyeler
Arsen Mamikonyan Arsen Mamikonyan
Author Profile Icon Arsen Mamikonyan
Arsen Mamikonyan
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Fun with Filters 2. Hand Gesture Recognition Using a Kinect Depth Sensor FREE CHAPTER 3. Finding Objects via Feature Matching and Perspective Transforms 4. 3D Scene Reconstruction Using Structure from Motion 5. Using Computational Photography with OpenCV 6. Tracking Visually Salient Objects 7. Learning to Recognize Traffic Signs 8. Learning to Recognize Facial Emotions 9. Learning to Classify and Localize Objects 10. Learning to Detect and Track Objects 11. Profiling and Accelerating Your Apps 12. Setting Up a Docker Container 13. Other Books You May Enjoy

Setting up the app

Before we can get down to the nitty-gritty of our gesture recognition algorithm, we need to make sure that we can access the depth sensor and display a stream of depth frames. In this section, we will cover the following things that will help us set up the app:

  • Accessing the Kinect 3D sensor
  • Utilizing OpenNI-compatible sensors
  • Running the app and main function routine

First, we will look at how to use the Kinect 3D sensor.

Accessing the Kinect 3D sensor

The easiest way to access a Kinect sensor is by using an OpenKinect module called freenect. For installation instructions, take a look at the preceding section.

The freenect module has functions such as sync_get_depth() and sync_get_video(), used to obtain images synchronously from the depth sensor and camera sensor respectively. For this chapter, we will need only the Kinect depth map, which is a single-channel (grayscale) image in which each pixel value is the estimated distance from the camera to a particular surface in the visual scene.

Here, we will design a function that will read a frame from the sensor and convert it to the desired format, and return the frame together with a success status, as follows:

def read_frame(): -> Tuple[bool,np.ndarray]:

The function consists of the following steps:

  1. Grab a frame; terminate the function if a frame was not acquired, like this:
    frame, timestamp = freenect.sync_get_depth() 
if frame is None:
return False, None

The sync_get_depth method returns both the depth map and a timestamp. By default, the map is in an 11-bit format. The last 10 bits of the sensor describes the depth, while the first bit states that the distance estimation was not successful when it's equal to 1.

  1. It is a good idea to standardize the data into an 8-bit precision format, as an 11-bit format is inappropriate to be visualized with cv2.imshow right away, as well as in the future. We might want to use some different sensor that returns in a different format, as follows:
np.clip(depth, 0, 2**10-1, depth) 
depth >>= 2 

In the previous code, we have first clipped the values to 1,023 (or 2**10-1) to fit in 10 bits. Such clipping results in the assignment of the undetected distance to the farthest possible point. Next, we shift 2 bits to the right to fit the distance in 8 bits.

  1. Finally, we convert the image into an 8-bit unsigned integer array and return the result, as follows:
return True, depth.astype(np.uint8) 

Now, the depth image can be visualized as follows:

cv2.imshow("depth", read_frame()[1]) 

Let's see how to use OpenNI-compatible sensors in the next section.

Utilizing OpenNI-compatible sensors

To use an OpenNI-compatible sensor, you must first make sure that OpenNI2 is installed and that your version of OpenCV was built with the support of OpenNI. The build information can be printed as follows:

import cv2
print(cv2.getBuildInformation())

If your version was built with OpenNI support, you will find it under the Video I/O section. Otherwise, you will have to rebuild OpenCV with OpenNI support, which is done by passing the -D WITH_OPENNI2=ON flag to cmake.

After the installation process is complete, you can access the sensor similarly to other video input devices, using cv2.VideoCapture. In this app, in order to use an OpenNI-compatible sensor instead of a Kinect 3D sensor, you have to cover the following steps:

  1. Create a video capture that connects to your OpenNI-compatible sensor, like this:
device = cv2.cv.CV_CAP_OPENNI 
capture = cv2.VideoCapture(device) 

If you want to connect to Asus Xtion, the device variable should be assigned to the cv2.CV_CAP_OPENNI_ASUS value instead.

  1. Change the input frame size to the standard Video Graphics Array (VGA) resolution, as follows:
capture.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, 640) 
capture.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, 480) 
  1. In the previous section, we designed the read_frame function, which accesses the Kinect sensor using freenect. In order to read depth images from the video capture, you have to change that function to the following one:
def read_frame():
if not capture.grab():
return False,None
return capture.retrieve(cv2.CAP_OPENNI_DEPTH_MAP)

You will note that we have used the grab and retrieve methods instead of the read method. The reason is that the read method of cv2.VideoCapture is inappropriate when we need to synchronize a set of cameras or a multi-head camera, such as a Kinect.

For such cases, you grab frames from multiple sensors at a certain moment in time with the grab method and then retrieve the data of the sensors of interest with the retrieve method. For example, in your own apps, you might also need to retrieve a BGR frame (standard camera frame), which can be done by passing cv2.CAP_OPENNI_BGR_IMAGE to the retrieve method.

So, now that you can read data from your sensor, let's see how to run the application in the next section.

Running the app and main function routine

The chapter2.py script is responsible for running the app, and it first imports the following modules:

import cv2
import numpy as np
from gestures import recognize
from frame_reader import read_frame

The recognize function is responsible for recognizing a hand gesture, and we will compose it later in this chapter. We have also placed the read_frame method that we composed in the previous section in a separate script, for convenience.

In order to simplify the segmentation task, we will instruct the user to place their hand in the center of the screen. To provide a visual aid for this, we create the following function:

def draw_helpers(img_draw: np.ndarray) -> None:
# draw some helpers for correctly placing hand
height, width = img_draw.shape[:2]
color = (0,102,255)
cv2.circle(img_draw, (width // 2, height // 2), 3, color, 2)
cv2.rectangle(img_draw, (width // 3, height // 3),
(width * 2 // 3, height * 2 // 3), color, 2)

The function draws a rectangle around the image center and highlights the center pixel of the image in orange.

All the heavy lifting is done by the main function, shown in the following code block:

def main():
for _, frame in iter(read_frame, (False, None)):

The function iterates over grayscale frames from Kinect, and, in each iteration, it covers the following steps:

  1. Recognize hand gestures using the recognize function, which returns the estimated number of extended fingers (num_fingers) and an annotated BGR color image, as follows:
num_fingers, img_draw = recognize(frame)
  1. Call the draw_helpers function on the annotated BGR image in order to provide a visual aid for hand placement, as follows:
 draw_helpers(img_draw)
  1. Finally, the main function draws the number of fingers on the annotated frame, displays results with cv2.imshow, and sets termination criteria, as follows:
        # print number of fingers on image
cv2.putText(img_draw, str(num_fingers), (30, 30),
cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255))
cv2.imshow("frame", img_draw)
# Exit on escape
if cv2.waitKey(10) == 27:
break

So, now that we have the main script, you will note that the only function that we are missing is the recognize function. In order to track hand gestures, we need to compose this function, which we will do in the next section.

You have been reading a chapter from
OpenCV 4 with Python Blueprints - Second Edition
Published in: Mar 2020
Publisher: Packt
ISBN-13: 9781789801811
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image