Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Learning OpenCV 4 Computer Vision with Python 3
Learning OpenCV 4 Computer Vision with Python 3

Learning OpenCV 4 Computer Vision with Python 3: Get to grips with tools, techniques, and algorithms for computer vision and machine learning , Third Edition

Arrow left icon
Profile Icon Joseph Howse Profile Icon Joe Minichino
Arrow right icon
$49.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.1 (14 Ratings)
Paperback Feb 2020 372 pages 3rd Edition
eBook
$9.99 $39.99
Paperback
$49.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Joseph Howse Profile Icon Joe Minichino
Arrow right icon
$49.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.1 (14 Ratings)
Paperback Feb 2020 372 pages 3rd Edition
eBook
$9.99 $39.99
Paperback
$49.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$9.99 $39.99
Paperback
$49.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Learning OpenCV 4 Computer Vision with Python 3

Handling Files, Cameras, and GUIs

Installing OpenCV and running samples is fun, but at this stage, we want to try things out in our own way. This chapter introduces OpenCV's I/O functionality. We also discuss the concept of a project and the beginnings of an object-oriented design for this project, which we will flesh out in subsequent chapters.

By starting with a look at I/O capabilities and design patterns, we will build our project in the same way we would make a sandwich: from the outside in. Bread slices and spread, or endpoints and glue, come before fillings or algorithms. We choose this approach because computer vision is mostly extrovertedit contemplates the real world outside our computerand we want to apply all of our subsequent algorithmic work to the real world through a common interface.

Specifically, in this chapter, our code samples and discussions will cover the following tasks:

  • Reading images from image files, video files, camera devices, or raw bytes of data in memory
  • Writing images to image files or video files
  • Manipulating image data in NumPy arrays
  • Displaying images in windows
  • Handling keyboard and mouse input
  • Implementing an application with an object-oriented design

Technical requirements

Basic I/O scripts

Most CV applications need to get images as input. Most also produce images as output. An interactive CV application might require a camera as an input source and a window as an output destination. However, other possible sources and destinations include image files, video files, and raw bytes. For example, raw bytes might be transmitted via a network connection, or they might be generated by an algorithm if we incorporate procedural graphics into our application. Let's look at each of these possibilities.

Reading/writing an image file

OpenCV provides the imread function to load an image from a file and the imwrite function to write an image to a file. These functions support various file formats for still images (not videos). The supported formats vary—as formats can be added or removed in a custom build of OpenCV—but normally BMP, PNG, JPEG, and TIFF are among the supported formats.

Let's explore the anatomy of the representation of an image in OpenCV and NumPy. An image is a multidimensional array; it has columns and rows of pixels, and each pixel has a value. For different kinds of image data, the pixel value may be formatted in different ways. For example, we can create a 3x3 square black image from scratch by simply creating a 2D NumPy array:

img = numpy.zeros((3, 3), dtype=numpy.uint8)

If we print this image to a console, we obtain the following result:

array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]], dtype=uint8)

Here, each pixel is represented by a single 8-bit integer, which means that the values for each pixel are in the 0-255 range, where 0 is black, 255 is white, and the in-between values are shades of gray. This is a grayscale image.

Let's now convert this image into blue-green-red (BGR) format using the cv2.cvtColor function:

img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)

Let's observe how the image has changed:

array([[[0, 0, 0],
[0, 0, 0],
[0, 0, 0]],

[[0, 0, 0],
[0, 0, 0],
[0, 0, 0]],

[[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]], dtype=uint8)

As you can see, each pixel is now represented by a three-element array, with each integer representing one of the three color channels: B, G, and R, respectively. Other common color models, such as HSV, will be represented in the same way, albeit with different value ranges. For example, the hue value of the HSV color model has a range of 0-180.

For more information about color models, refer to Chapter 3, Processing Images with OpenCV, specifically the Converting between different color models section.

You can check the structure of an image by inspecting the shape property, which returns rows, columns, and the number of channels (if there is more than one).

Consider this example:

img = numpy.zeros((5, 3), dtype=numpy.uint8)
print(img.shape)

The preceding code will print (5, 3); in other words, we have a grayscale image with 5 rows and 3 columns. If you then converted the image into BGR, the shape would be (5, 3, 3), which indicates the presence of three channels per pixel.

Images can be loaded from one file format and saved to another. For example, let's convert an image from PNG into JPEG:

import cv2

image = cv2.imread('MyPic.png')
cv2.imwrite('MyPic.jpg', image)
OpenCV's Python module is called cv2 even though we are using OpenCV 4.x and not OpenCV 2.x. Historically, OpenCV had two Python modules: cv2 and cv. The latter wrapped a legacy version of OpenCV implemented in C. Nowadays, OpenCV has only the cv2 Python module, which wraps the current version of OpenCV implemented in C++.

By default, imread returns an image in the BGR color format even if the file uses a grayscale format. BGR represents the same color model as red-green-blue (RGB), but the byte order is reversed.

Optionally, we may specify the mode of imread. The supported options include the following:

  • cv2.IMREAD_COLOR: This is the default option, providing a 3-channel BGR image with an 8-bit value (0-255) for each channel.
  • cv2.IMREAD_GRAYSCALE: This provides an 8-bit grayscale image.
  • cv2.IMREAD_ANYCOLOR: This provides either an 8-bit-per-channel BGR image or an 8-bit grayscale image, depending on the metadata in the file.
  • cv2.IMREAD_UNCHANGED: This reads all of the image data, including the alpha or transparency channel (if there is one) as a fourth channel.
  • cv2.IMREAD_ANYDEPTH: This loads an image in grayscale at its original bit depth. For example, it provides a 16-bit-per-channel grayscale image if the file represents an image in this format.
  • cv2.IMREAD_ANYDEPTH | cv2.IMREAD_COLOR: This combination loads an image in BGR color at its original bit depth.
  • cv2.IMREAD_REDUCED_GRAYSCALE_2: This loads an image in grayscale at half its original resolution. For example, if the file contains a 640 x 480 image, it is loaded as a 320 x 240 image.
  • cv2.IMREAD_REDUCED_COLOR_2: This loads an image in 8-bit-per-channel BGR color at half its original resolution.
  • cv2.IMREAD_REDUCED_GRAYSCALE_4: This loads an image in grayscale at one-quarter of its original resolution.
  • cv2.IMREAD_REDUCED_COLOR_4: This loads an image in 8-bit-per-channel color at one-quarter of its original resolution.
  • cv2.IMREAD_REDUCED_GRAYSCALE_8: This loads an image in grayscale at one-eighth of its original resolution.
  • cv2.IMREAD_REDUCED_COLOR_8: This loads an image in 8-bit-per-channel color at one-eighth of its original resolution.

As an example, let's load a PNG file as a grayscale image (losing any color information in the process), and then save it as a grayscale PNG image:

import cv2

grayImage = cv2.imread('MyPic.png', cv2.IMREAD_GRAYSCALE)
cv2.imwrite('MyPicGray.png', grayImage)

The path of an image, unless absolute, is relative to the working directory (the path from which the Python script is run), so, in the preceding example, MyPic.png would have to be in the working directory or the image would not be found. If you prefer to avoid assumptions about the working directory, you can use absolute paths, such as C:\Users\Joe\Pictures\MyPic.png on Windows, /Users/Joe/Pictures/MyPic.png on Mac, or /home/joe/pictures/MyPic.png on Linux.

The imwrite() function requires an image to be in the BGR or grayscale format with a certain number of bits per channel that the output format can support. For example, the BMP file format requires 8 bits per channel, while PNG allows either 8 or 16 bits per channel.

Converting between an image and raw bytes

Conceptually, a byte is an integer ranging from 0 to 255. Throughout real-time graphic applications today, a pixel is typically represented by one byte per channel, though other representations are also possible.

An OpenCV image is a 2D or 3D array of the numpy.array type. An 8-bit grayscale image is a 2D array containing byte values. A 24-bit BGR image is a 3D array, which also contains byte values. We may access these values by using an expression such as image[0, 0] or image[0, 0, 0]. The first index is the pixel's y coordinate or row, 0 being the top. The second index is the pixel's x coordinate or column, 0 being the leftmost. The third index (if applicable) represents a color channel. The array's three dimensions can be visualized in the following Cartesian coordinate system:

For example, in an 8-bit grayscale image with a white pixel in the upper-left corner, image[0, 0] is 255. For a 24-bit (8-bit-per-channel) BGR image with a blue pixel in the upper-left corner, image[0, 0] is [255, 0, 0].

Provided that an image has 8 bits per channel, we can cast it to a standard Python bytearray object, which is one-dimensional:

byteArray = bytearray(image)

Conversely, provided that bytearray contains bytes in an appropriate order, we can cast and then reshape it to get a numpy.array type that is an image:

grayImage = numpy.array(grayByteArray).reshape(height, width)
bgrImage = numpy.array(bgrByteArray).reshape(height, width, 3)

As a more complete example, let's convert bytearray that contains random bytes into a grayscale image and a BGR image:

import cv2
import numpy
import os

# Make an array of 120,000 random bytes.
randomByteArray = bytearray(os.urandom(120000))
flatNumpyArray = numpy.array(randomByteArray)

# Convert the array to make a 400x300 grayscale image.
grayImage = flatNumpyArray.reshape(300, 400)
cv2.imwrite('RandomGray.png', grayImage)

# Convert the array to make a 400x100 color image.
bgrImage = flatNumpyArray.reshape(100, 400, 3)
cv2.imwrite('RandomColor.png', bgrImage)
Here, we use Python's standard os.urandom function to generate random raw bytes, which we then convert into a NumPy array. Note that it is also possible to generate a random NumPy array directly (and more efficiently) using a statement such as numpy.random.randint(0, 256, 120000).reshape(300, 400). The only reason we use os.urandom is to help to demonstrate conversion from raw bytes.

After running this script, we should have a pair of randomly generated images, RandomGray.png and RandomColor.png, in the script's directory.

Here is an example of RandomGray.png (though yours will almost certainly differ since it is random):

Similarly, here is an example of RandomColor.png:

Now that we have a better understanding of how an image is formed from data, we can start performing basic operations on it.

Accessing image data with numpy.array

We already know that the easiest (and most common) way to load an image in OpenCV is to use the imread function. We also know that this will return an image, which is really an array (either a 2D or 3D one, depending on the parameters you passed to imread).

The numpy.array class is greatly optimized for array operations, and it allows certain kinds of bulk manipulations that are not available in a plain Python list. These kinds of numpy.array type-specific operations come in handy for image manipulations in OpenCV. However, let's explore image manipulations step by step, starting with a basic example. Say you want to manipulate a pixel at coordinates (0, 0) in a BGR image and turn it into a white pixel:

import cv2

img = cv2.imread('MyPic.png')
img[0, 0] = [255, 255, 255]

If you then save the modified image to file and view it, you will see a white dot in the top-left corner of the image. Naturally, this modification is not very useful, but it begins to show the possibilities. Now, let's leverage the capabilities of numpy.array to perform transformations on an array much faster than we could do with a plain Python list.

Let's say that you want to change the blue value of a particular pixel, say, the pixel at coordinates, (150, 120). The numpy.array type provides a handy method, item, which takes three parameters: the x (or left) position, the y (or top) position, and the index within the array at the (x, y) position (remember that in a BGR image, the data at a certain position is a three-element array containing the B, G, and R values in this order) and returns the value at the index position. Another method, itemset, sets the value of a particular channel of a particular pixel to a specified value. itemset takes two arguments: a three-element tuple (x, y, and index) and the new value.

In the following example, we change the value of the blue channel at (150, 120) from its current value to an arbitrary 255:

import cv2

img = cv2.imread('MyPic.png')
img.itemset((150, 120, 0), 255) # Sets the value of a pixel's blue channel
print(img.item(150, 120, 0)) # Prints the value of a pixel's blue channel

For modifying a single element in an array, the itemset method is somewhat faster than the indexing syntax that we saw in the first example in this section.

Again, modifying an element of an array does not do much in itself, but it does open a world of possibilities. However, for performance reasons, this is only suitable for small regions of interest. When you need to manipulate an entire image or a large region of interest, it is advisable that you utilize either OpenCV's functions or NumPy's array slicing. The latter allows you to specify a range of indices. Let's consider an example of using array slicing to manipulate color channels. Setting all G (green) values of an image to 0 is as simple as the following code:

import cv2

img = cv2.imread('MyPic.png')
img[:, :, 1] = 0

This piece of code performs a fairly significant operation and is easy to understand. The relevant line is the last one, which basically instructs the program to take all pixels from all rows and columns and set the green value (at index one of the three-element BGR array) to 0. If you display this image, you will notice a complete absence of green.

There are several interesting things we can do by accessing raw pixels with NumPy's array slicing; one of them is defining regions of interests (ROI). Once the region is defined, we can perform a number of operations. For example, we can bind this region to a variable, define a second region, and assign the value of the first region to the second (hence, copying a portion of the image over to another position in the image):

import cv2

img = cv2.imread('MyPic.png')
my_roi = img[0:100, 0:100]
img[300:400, 300:400] = my_roi

It is important to make sure that the two regions correspond in terms of size. If not, NumPy will (rightly) complain that the two shapes are mismatched.

Finally, we can access the properties of numpy.array, as shown in the following code:

import cv2

img = cv2.imread('MyPic.png')
print(img.shape)
print(img.size)
print(img.dtype)

These three properties are defined as follows:

  • shape: This is a tuple describing the shape of the array. For an image, it contains (in order) the height, width, and—if the image is in colorthe number of channels. The length of the shape tuple is a useful way to determine whether an image is grayscale or color. For a grayscale image, we have len(shape) == 2, and for a color image, len(shape) == 3.
  • size: This is the number of elements in the array. In the case of a grayscale image, this is the same as the number of pixels. In the case of a BGR image, it is three times the number of pixels because each pixel is represented by three elements (B, G, and R).
  • dtype: This is the datatype of the array's elements. For an 8-bit-per-channel image, the datatype is numpy.uint8.

All in all, it is strongly advised that you familiarize yourself with NumPy in general, and numpy.array in particular, when working with OpenCV. This class is the foundation of any image processing done with OpenCV in Python.

Reading/writing a video file

OpenCV provides the VideoCapture and VideoWriter classes, which support various video file formats. The supported formats vary depending on the operating system and the build configuration of OpenCV, but normally it is safe to assume that the AVI format is supported. Via its read method, a VideoCapture object may be polled for new frames until it reaches the end of its video file. Each frame is an image in a BGR format.

Conversely, an image may be passed to the write method of the VideoWriter class, which appends the image to a file in VideoWriter. Let's look at an example that reads frames from one AVI file and writes them to another with a YUV encoding:

import cv2

videoCapture = cv2.VideoCapture('MyInputVid.avi')
fps = videoCapture.get(cv2.CAP_PROP_FPS)
size = (int(videoCapture.get(cv2.CAP_PROP_FRAME_WIDTH)),
int(videoCapture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
videoWriter = cv2.VideoWriter(
'MyOutputVid.avi', cv2.VideoWriter_fourcc('I','4','2','0'),
fps, size)

success, frame = videoCapture.read()
while success: # Loop until there are no more frames.
videoWriter.write(frame)
success, frame = videoCapture.read()

The arguments to the constructor of the VideoWriter class deserve special attention. A video's filename must be specified. Any preexisting file with this name is overwritten. A video codec must also be specified. The available codecs may vary from system to system. The supported options may include the following:

  • 0: This option is an uncompressed raw video file. The file extension should be .avi.
  • cv2.VideoWriter_fourcc('I','4','2','0'): This option is an uncompressed YUV encoding, 4:2:0 chroma subsampled. This encoding is widely compatible but produces large files. The file extension should be .avi.
  • cv2.VideoWriter_fourcc('P','I','M','1'): This option is MPEG-1. The file extension should be .avi.
  • cv2.VideoWriter_fourcc('X','V','I','D'): This option is a relatively old MPEG-4 encoding. It is a good option if you want to limit the size of the resulting video. The file extension should be .avi.
  • cv2.VideoWriter_fourcc('M','P','4','V'): This option is another relatively old MPEG-4 encoding. It is a good option if you want to limit the size of the resulting video. The file extension should be .mp4.
  • cv2.VideoWriter_fourcc('X','2','6','4'): This option is a relatively new MPEG-4 encoding. It may be the best option if you want to limit the size of the resulting video. The file extension should be .mp4.
  • cv2.VideoWriter_fourcc('T','H','E','O'): This option is Ogg Vorbis. The file extension should be .ogv.
  • cv2.VideoWriter_fourcc('F','L','V','1'): This option is a Flash video. The file extension should be .flv.

A frame rate and frame size must be specified too. Since we are copying from another video, these properties can be read from the get method of the VideoCapture class.

Capturing camera frames

A stream of camera frames is represented by a VideoCapture object too. However, for a camera, we construct a VideoCapture object by passing the camera's device index instead of a video's filename. Let's consider the following example, which captures 10 seconds of video from a camera and writes it to an AVI file. The code is similar to the previous section's sample (which was captured from a video file instead of a camera) but changes are marked in bold:

import cv2

cameraCapture = cv2.VideoCapture(0)
fps = 30 # An assumption
size = (int(cameraCapture.get(cv2.CAP_PROP_FRAME_WIDTH)),
int(cameraCapture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
videoWriter = cv2.VideoWriter(
'MyOutputVid.avi', cv2.VideoWriter_fourcc('I','4','2','0'),
fps, size)

success, frame = cameraCapture.read()
numFramesRemaining = 10 * fps - 1 # 10 seconds of frames
while success and numFramesRemaining > 0:
videoWriter.write(frame)
success, frame = cameraCapture.read()
numFramesRemaining -= 1

For some cameras on certain systems, cameraCapture.get(cv2.CAP_PROP_FRAME_WIDTH) and cameraCapture.get(cv2.CAP_PROP_FRAME_HEIGHT) may return inaccurate results. To be more certain of the actual image dimensions, you can first capture a frame and then get its height and width with code such as h, w = frame.shape[:2]. Occasionally, you might even encounter a camera that yields a few bad frames with unstable dimensions before it starts yielding good frames with stable dimensions. If you are concerned about guarding against this kind of quirk, you may want to read and ignore a few frames at the start of a capture session.

Unfortunately, in most cases, the get method of VideoCapture does not return an accurate value for the camera's frame rate; it typically returns 0. The official documentation at http://docs.opencv.org/modules/highgui/doc/reading_and_writing_images_and_video.html warns of the following:

"Value 0 is returned when querying a property that is not supported by the backend used by the VideoCapture instance.

Note
Reading / writing properties involves many layers. Some unexpected result might happens [sic] along this chain.
VideoCapture -> API Backend -> Operating System -> Device Driver -> Device Hardware
The returned value might be different from what really is used by the device or it could be encoded using device-dependent rules (for example, steps or percentage). Effective behavior depends from [sic] device driver and the API backend."

To create an appropriate VideoWriter class for the camera, we have to either make an assumption about the frame rate (as we did in the preceding code) or measure it using a timer. The latter approach is better and we will cover it later in this chapter.

The number of cameras and their order is, of course, system-dependent. Unfortunately, OpenCV does not provide any means of querying the number of cameras or their properties. If an invalid index is used to construct a VideoCapture class, the VideoCapture class will not yield any frames; its read method will return (False, None). To avoid trying to retrieve frames from a VideoCapture object that was not opened correctly, you may want to first call the VideoCapture.isOpened method, which returns a Boolean.

The read method is inappropriate when we need to synchronize either a set of cameras or a multihead camera such as a stereo camera. Then, we use the grab and retrieve methods instead. For a set of two cameras, we can use code similar to the following:

success0 = cameraCapture0.grab()
success1 = cameraCapture1.grab()
if success0 and success1:
frame0 = cameraCapture0.retrieve()
frame1 = cameraCapture1.retrieve()

Displaying an image in a window

One of the most basic operations in OpenCV is displaying an image in a window. This can be done with the imshow function. If you come from any other GUI framework background, you might think it sufficient to call imshow to display an image. However, in OpenCV, the window is drawn (or re-drawn) only when you call another function, waitKey. The latter function pumps the window's event queue (allowing various events such as drawing to be handled), and it returns the keycode of any key that the user may have typed within a specified timeout. To some extent, this rudimentary design simplifies the task of developing demos that use video or webcam input; at least the developer has manual control over the capture and display of new frames.

Here is a very simple sample script to read an image from a file and display it:

import cv2
import numpy as np

img = cv2.imread('my-image.png')
cv2.imshow('my image', img)
cv2.waitKey()
cv2.destroyAllWindows()

The imshow function takes two parameters: the name of the window in which we want to display the image and the image itself. We will talk about waitKey in more detail in the next section, Displaying camera frames in a window.

The aptly named destroyAllWindows function disposes of all of the windows created by OpenCV.

Displaying camera frames in a window

OpenCV allows named windows to be created, redrawn, and destroyed using the namedWindow, imshow, and destroyWindow functions. Also, any window may capture keyboard input via the waitKey function and mouse input via the setMouseCallback function. Let's look at an example where we show the frames captured from a live camera:

import cv2

clicked = False
def onMouse(event, x, y, flags, param):
global clicked
if event == cv2.EVENT_LBUTTONUP:
clicked = True

cameraCapture = cv2.VideoCapture(0)
cv2.namedWindow('MyWindow')
cv2.setMouseCallback('MyWindow', onMouse)

print('Showing camera feed. Click window or press any key to stop.')
success, frame = cameraCapture.read()
while success and cv2.waitKey(1) == -1 and not clicked:
cv2.imshow('MyWindow', frame)
success, frame = cameraCapture.read()

cv2.destroyWindow('MyWindow')
cameraCapture.release()

The argument for waitKey is a number of milliseconds to wait for keyboard input. By default, it is 0, which is a special value meaning infinity. The return value is either -1 (meaning that no key has been pressed) or an ASCII keycode, such as 27 for Esc. For a list of ASCII keycodes, refer to http://www.asciitable.com/. Also, note that Python provides a standard function, ord, which can convert a character into its ASCII keycode. For example, ord('a') returns 97.

Again, note that OpenCV's window functions and waitKey are interdependent. OpenCV windows are only updated when waitKey is called. Conversely, waitKey only captures input when an OpenCV window has focus.

The mouse callback passed to setMouseCallback should take five arguments, as seen in our code sample. The callback's param argument is set as an optional third argument to setMouseCallback. By default, it is 0. The callback's event argument is one of the following actions:

  • cv2.EVENT_MOUSEMOVE: This event refers to mouse movement.
  • cv2.EVENT_LBUTTONDOWN: This event refers to the left button going down when it is pressed.
  • cv2.EVENT_RBUTTONDOWN: This event refers to the right button going down when it is pressed.
  • cv2.EVENT_MBUTTONDOWN: This event refers to the middle button going down when it is pressed.
  • cv2.EVENT_LBUTTONUP: This event refers to the left button coming back up when it is released.
  • cv2.EVENT_RBUTTONUP: This event refers to the right button coming back up when it is released.
  • cv2.EVENT_MBUTTONUP: This event refers to the middle button coming back up when it is released.
  • cv2.EVENT_LBUTTONDBLCLK: This event refers to the left button being double-clicked.
  • cv2.EVENT_RBUTTONDBLCLK: This event refers to the right button being double-clicked.
  • cv2.EVENT_MBUTTONDBLCLK: This event refers to the middle button being double-clicked.

The mouse callback's flags argument may be some bitwise combination of the following events:

  • cv2.EVENT_FLAG_LBUTTON: This event refers to the left button being pressed.
  • cv2.EVENT_FLAG_RBUTTON: This event refers to the right button being pressed.
  • cv2.EVENT_FLAG_MBUTTON: This event refers to the middle button being pressed.
  • cv2.EVENT_FLAG_CTRLKEY: This event refers to the Ctrl key being pressed.
  • cv2.EVENT_FLAG_SHIFTKEY: This event refers to the Shift key being pressed.
  • cv2.EVENT_FLAG_ALTKEY: This event refers to the Alt key being pressed.

Unfortunately, OpenCV does not provide any means of manually handling window events. For example, we cannot stop our application when a window's close button is clicked. Due to OpenCV's limited event handling and GUI capabilities, many developers prefer to integrate it with other application frameworks. Later in this chapter, in the Cameo – an object-oriented design section, we will design an abstraction layer to help to integrate OpenCV with any application framework.

Project Cameo (face tracking and image manipulation)

OpenCV is often studied through a cookbook approach that covers a lot of algorithms, but nothing about high-level application development. To an extent, this approach is understandable because OpenCV's potential applications are so diverse. OpenCV is used in a wide variety of applications, such as photo/video editors, motion-controlled games, a robot's AI, or psychology experiments where we log participants' eye movements. Across these varied use cases, can we truly study a useful set of abstractions?

The book's authors believe we can, and the sooner we start creating abstractions, the better. We will structure many of our OpenCV examples around a single application, but, at each step, we will design a component of this application to be extensible and reusable.

We will develop an interactive application that performs face tracking and image manipulations on camera input in real time. This type of application covers a broad range of OpenCV's functionality and challenges us to create an efficient, effective implementation.

Specifically, our application will merge faces in real time. Given two streams of camera input (or, optionally, prerecorded video input), the application will superimpose faces from one stream atop faces in the other. Filters and distortions will be applied to give this blended scene a unified look and feel. Users should have the experience of being engaged in a live performance where they enter another environment and persona. This type of user experience is popular in amusement parks such as Disneyland.

In such an application, users would immediately notice flaws, such as a low frame rate or inaccurate tracking. To get the best results, we will try several approaches using conventional imaging and depth imaging.

We will call our application Cameo. A cameo (in jewelry) is a small portrait of a person or (in film) a very brief role played by a celebrity.

Cameo – an object-oriented design

Python applications can be written in a purely procedural style. This is often done with small applications, such as our basic I/O scripts, discussed previously. However, from now on, we will often use an object-oriented style because it promotes modularity and extensibility.

From our overview of OpenCV's I/O functionality, we know that all images are similar, regardless of their source or destination. No matter how we obtain a stream of images or where we send it as output, we can apply the same application-specific logic to each frame in this stream. Separation of I/O code and application code becomes especially convenient in an application, such as Cameo, which uses multiple I/O streams.

We will create classes called CaptureManager and WindowManager as high-level interfaces to I/O streams. Our application code may use CaptureManager to read new frames and, optionally, to dispatch each frame to one or more outputs, including a still image file, a video file, and a window (via a WindowManager class). A WindowManager class lets our application code handle a window and events in an object-oriented style.

Both CaptureManager and WindowManager are extensible. We could make implementations that do not rely on OpenCV for I/O.

Abstracting a video stream with managers.CaptureManager

As we have seen, OpenCV can capture, show, and record a stream of images from either a video file or camera, but there are some special considerations in each case. Our CaptureManager class abstracts some of the differences and provides a higher-level interface to dispatch images from the capture stream to one or more outputs—a still image file, video file, or window.

A CaptureManager object is initialized with a VideoCapture object and has enterFrame and exitFrame methods that should typically be called on every iteration of an application's main loop. Between a call to enterFrame and exitFrame, the application may (any number of times) set a channel property and get a frame property. The channel property is initially 0 and only multihead cameras use other values. The frame property is an image corresponding to the current channel's state when enterFrame was called.

A CaptureManager class also has the writeImage, startWritingVideo, and stopWritingVideo methods that may be called at any time. Actual file writing is postponed until exitFrame. Also, during the exitFrame method, frame may be shown in a window, depending on whether the application code provides a WindowManager class either as an argument to the constructor of CaptureManager or by setting the previewWindowManager property.

If the application code manipulates frame, the manipulations are reflected in recorded files and in the window. A CaptureManager class has a constructor argument and property called shouldMirrorPreview, which should be True if we want frame to be mirrored (horizontally flipped) in the window but not in recorded files. Typically, when facing a camera, users prefer a live camera feed to be mirrored.

Recall that a VideoWriter object needs a frame rate, but OpenCV does not provide any reliable way to get an accurate frame rate for a camera. The CaptureManager class works around this limitation by using a frame counter and Python's standard time.time function to estimate the frame rate if necessary. This approach is not foolproof. Depending on frame rate fluctuations and the system-dependent implementation of time.time, the accuracy of the estimate might still be poor in some cases. However, if we deploy to unknown hardware, it is better than just assuming that the user's camera has a particular frame rate.

Let's create a file called managers.py, which will contain our implementation of CaptureManager. This implementation turns out to be quite long, so we will look at it in several pieces:

  1. First, let's add imports and a constructor, as follows:
import cv2
import numpy
import time


class CaptureManager(object):

def __init__(self, capture, previewWindowManager = None,
shouldMirrorPreview = False):

self.previewWindowManager = previewWindowManager
self.shouldMirrorPreview = shouldMirrorPreview

self._capture = capture
self._channel = 0
self._enteredFrame = False
self._frame = None
self._imageFilename = None
self._videoFilename = None
self._videoEncoding = None
self._videoWriter = None

self._startTime = None
self._framesElapsed = 0
self._fpsEstimate = None
  1. Next, let's add the following getter and setter methods for the properties of CaptureManager:
    @property
def channel(self):
return self._channel

@channel.setter
def channel(self, value):
if self._channel != value:
self._channel = value
self._frame = None

@property
def frame(self):
if self._enteredFrame and self._frame is None:
_, self._frame = self._capture.retrieve(
self._frame, self.channel)
return self._frame

@property
def isWritingImage(self):
return self._imageFilename is not None

@property
def isWritingVideo(self):
return self._videoFilename is not None

Note that most of the member variables are nonpublic, as denoted by the underscore prefix in variable names, such as self._enteredFrame. These nonpublic variables relate to the state of the current frame and any file-writing operations. As discussed previously, the application code only needs to configure a few things, which are implemented as constructor arguments and settable public properties: the camera channel, the window manager, and the option to mirror the camera preview.

This book assumes a certain level of familiarity with Python; however, if you are getting confused by those @ annotations (for example, @property), refer to the Python documentation about decorators, a built-in feature of the language that allows the wrapping of a function by another function, normally used to apply user-defined behavior in several places of an application. Specifically, you can find relevant documentation at https://docs.python.org/3/reference/compound_stmts.html#grammar-token-decorator.

Python does not enforce the concept of nonpublic member variables, but in cases where the developer intends a variable to be treated as nonpublic, you will often see the single-underscore prefix (_) or double-underscore prefix (__). The single-underscore prefix is just a convention, indicating that the variable should be treated as protected (accessed only within the class and its subclasses). The double-underscore prefix actually causes the Python interpreter to rename the variable, such that MyClass.__myVariable becomes MyClass._MyClass__myVariable. This is called name mangling (quite appropriately). By convention, such a variable should be treated as private (accessed only within the class, and not its subclasses). The same prefixes, with the same significance, can be applied to methods as well as variables.
  1. Continuing with our implementation, let's add the enterFrame method to managers.py:
    def enterFrame(self):
"""Capture the next frame, if any."""

# But first, check that any previous frame was exited.
assert not self._enteredFrame, \
'previous enterFrame() had no matching exitFrame()'

if self._capture is not None:
self._enteredFrame = self._capture.grab()

Note that the implementation of enterFrame only grabs (synchronizes) a frame, whereas actual retrieval from a channel is postponed to a subsequent reading of the frame variable.

  1. Next, let's add the exitFrame method to managers.py:
    def exitFrame(self):
"""Draw to the window. Write to files. Release the
frame."""

# Check whether any grabbed frame is retrievable.
# The getter may retrieve and cache the frame.
if self.frame is None:
self._enteredFrame = False
return

# Update the FPS estimate and related variables.
if self._framesElapsed == 0:
self._startTime = time.time()
else:
timeElapsed = time.time() - self._startTime
self._fpsEstimate = self._framesElapsed / timeElapsed
self._framesElapsed += 1

# Draw to the window, if any.
if self.previewWindowManager is not None:
if self.shouldMirrorPreview:
mirroredFrame = numpy.fliplr(self._frame)
self.previewWindowManager.show(mirroredFrame)
else:
self.previewWindowManager.show(self._frame)

# Write to the image file, if any.
if self.isWritingImage:
cv2.imwrite(self._imageFilename, self._frame)
self._imageFilename = None

# Write to the video file, if any.
self._writeVideoFrame()

# Release the frame.
self._frame = None
self._enteredFrame = False

The implementation of exitFrame takes the image from the current channel, estimates a frame rate, shows the image via the window manager (if any), and fulfills any pending requests to write the image to files.

  1. Several other methods also pertain to file writing. Let's add the following implementations of public methods named writeImage, startWritingVideo, and stopWritingVideo to managers.py:
    def writeImage(self, filename):
"""Write the next exited frame to an image file."""
self._imageFilename = filename

def startWritingVideo(
self, filename,
encoding = cv2.VideoWriter_fourcc('M','J','P','G')):
"""Start writing exited frames to a video file."""
self._videoFilename = filename
self._videoEncoding = encoding

def stopWritingVideo(self):
"""Stop writing exited frames to a video file."""
self._videoFilename = None
self._videoEncoding = None
self._videoWriter = None

The preceding methods simply update the parameters for file-writing operations, whereas the actual writing operations are postponed to the next call of exitFrame.

  1. Earlier in this section, we saw that exitFrame calls a helper method named _writeVideoFrame. Let's add the following implementation of _writeVideoFrame to managers.py:
    def _writeVideoFrame(self):

if not self.isWritingVideo:
return

if self._videoWriter is None:
fps = self._capture.get(cv2.CAP_PROP_FPS)
if fps <= 0.0:
# The capture's FPS is unknown so use an estimate.
if self._framesElapsed < 20:
# Wait until more frames elapse so that the
# estimate is more stable.
return
else:
fps = self._fpsEstimate
size = (int(self._capture.get(
cv2.CAP_PROP_FRAME_WIDTH)),
int(self._capture.get(
cv2.CAP_PROP_FRAME_HEIGHT)))
self._videoWriter = cv2.VideoWriter(
self._videoFilename, self._videoEncoding,
fps, size)

self._videoWriter.write(self._frame)

The preceding method creates or appends to a video file in a manner that should be familiar from our earlier scripts (refer to the Reading/writing a video file section, earlier in this chapter). However, in situations where the frame rate is unknown, we skip some frames at the start of the capture session so that we have time to build up an estimate of the frame rate.

This concludes our implementation of CaptureManager. Although it relies on VideoCapture, we could make other implementations that do not use OpenCV for input. For example, we could make a subclass that is instantiated with a socket connection, whose byte stream could be parsed as a stream of images. Also, we could make a subclass that uses a third-party camera library with different hardware support than what OpenCV provides. However, for Cameo, our current implementation is sufficient.

Abstracting a window and keyboard with managers.WindowManager

As we have seen, OpenCV provides functions that cause a window to be created, be destroyed, show an image, and process events. Rather than being methods of a window class, these functions require a window's name to pass as an argument. Since this interface is not object-oriented, it is arguably inconsistent with OpenCV's general style. Also, it is unlikely to be compatible with other window-or event-handling interfaces that we might eventually want to use instead of OpenCV's.

For the sake of object orientation and adaptability, we abstract this functionality into a WindowManager class with the createWindow, destroyWindow, show, and processEvents methods. As a property, WindowManager has a function object called keypressCallback, which (if it is not None) is called from processEvents in response to any keypress. The keypressCallback object must be a function that takes a single argument, specifically an ASCII keycode.

Let's add an implementation of WindowManager to managers.py. The implementation begins with the following class declaration and __init__ method:

class WindowManager(object):

def __init__(self, windowName, keypressCallback = None):
self.keypressCallback = keypressCallback

self._windowName = windowName
self._isWindowCreated = False

The implementation continues with the following methods to manage the life cycle of the window and its events:

    @property
def isWindowCreated(self):
return self._isWindowCreated

def createWindow(self):
cv2.namedWindow(self._windowName)
self._isWindowCreated = True

def show(self, frame):
cv2.imshow(self._windowName, frame)

def destroyWindow(self):
cv2.destroyWindow(self._windowName)
self._isWindowCreated = False

def processEvents(self):
keycode = cv2.waitKey(1)
if self.keypressCallback is not None and keycode != -1:
self.keypressCallback(keycode)

Our current implementation only supports keyboard events, which will be sufficient for Cameo. However, we could modify WindowManager to support mouse events, too. For example, the class interface could be expanded to include a mouseCallback property (and optional constructor argument,) but could otherwise remain the same. With an event framework other than OpenCV's, we could support additional event types in the same way by adding callback properties.

Applying everything with cameo.Cameo

Our application is represented by the Cameo class with two methods: run and onKeypress. On initialization, a Cameo object creates a WindowManager object with onKeypress as a callback, as well as a CaptureManager object using a camera (specifically, a cv2.VideoCapture object) and the same WindowManager object. When run is called, the application executes a main loop in which frames and events are processed.

As a result of event processing, onKeypress may be called. The spacebar causes a screenshot to be taken, Tab causes a screencast (a video recording) to start/stop, and Esc causes the application to quit.

In the same directory as managers.py, let's create a file called cameo.py, where we will implement the Cameo class:

  1. The implementation begins with the following import statements and __init__ method:
import cv2
from managers import WindowManager, CaptureManager

class Cameo(object):

def __init__(self):
self._windowManager = WindowManager('Cameo',
self.onKeypress)
self._captureManager = CaptureManager(
cv2.VideoCapture(0), self._windowManager, True)
  1. Next, let's add the following implementation of the run() method:
    def run(self):
"""Run the main loop."""
self._windowManager.createWindow()
while self._windowManager.isWindowCreated:
self._captureManager.enterFrame()
frame = self._captureManager.frame

if frame is not None:
# TODO: Filter the frame (Chapter 3).
pass

self._captureManager.exitFrame()
self._windowManager.processEvents()
  1. To complete the Cameo class implementation, here is the onKeypress() method:
    def onKeypress(self, keycode):
"""Handle a keypress.

space -> Take a screenshot.
tab -> Start/stop recording a screencast.
escape -> Quit.

"""
if keycode == 32: # space
self._captureManager.writeImage('screenshot.png')
elif keycode == 9: # tab
if not self._captureManager.isWritingVideo:
self._captureManager.startWritingVideo(
'screencast.avi')
else:
self._captureManager.stopWritingVideo()
elif keycode == 27: # escape
self._windowManager.destroyWindow()
  1. Finally, let's add a __main__ block that instantiates and runs Cameo, as follows:
if __name__=="__main__":
Cameo().run()

When running the application, note that the live camera feed is mirrored, while screenshots and screencasts are not. This is the intended behavior, as we pass True for shouldMirrorPreview when initializing the CaptureManager class.

Here is a screenshot of Cameo, showing a window (with the title Cameo) and the current frame from a camera:

So far, we do not manipulate the frames in any way except to mirror them for preview. We will start to add more interesting effects in Chapter 3, Processing Images with OpenCV.

Summary

By now, we should have an application that displays a camera feed, listens for keyboard input, and (on command) records a screenshot or screencast. We are ready to extend the application by inserting some image-filtering code (Chapter 3, Processing Images with OpenCV) between the start and end of each frame. Optionally, we are also ready to integrate other camera drivers or application frameworks besides the ones supported by OpenCV.

We also possess the knowledge to manipulate images as NumPy arrays. This forms the perfect foundation for our next topic, filtering images.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Build powerful computer vision applications in concise code with OpenCV 4 and Python 3
  • Learn the fundamental concepts of image processing, object classification, and 2D and 3D tracking
  • Train, use, and understand machine learning models such as Support Vector Machines (SVMs) and neural networks

Description

Computer vision is a rapidly evolving science, encompassing diverse applications and techniques. This book will not only help those who are getting started with computer vision but also experts in the domain. You’ll be able to put theory into practice by building apps with OpenCV 4 and Python 3. You’ll start by understanding OpenCV 4 and how to set it up with Python 3 on various platforms. Next, you’ll learn how to perform basic operations such as reading, writing, manipulating, and displaying still images, videos, and camera feeds. From taking you through image processing, video analysis, and depth estimation and segmentation, to helping you gain practice by building a GUI app, this book ensures you’ll have opportunities for hands-on activities. Next, you’ll tackle two popular challenges: face detection and face recognition. You’ll also learn about object classification and machine learning concepts, which will enable you to create and use object detectors and classifiers, and even track objects in movies or video camera feed. Later, you’ll develop your skills in 3D tracking and augmented reality. Finally, you’ll cover ANNs and DNNs, learning how to develop apps for recognizing handwritten digits and classifying a person's gender and age. By the end of this book, you’ll have the skills you need to execute real-world computer vision projects.

Who is this book for?

If you are interested in learning computer vision, machine learning, and OpenCV in the context of practical real-world applications, then this book is for you. This OpenCV book will also be useful for anyone getting started with computer vision as well as experts who want to stay up-to-date with OpenCV 4 and Python 3. Although no prior knowledge of image processing, computer vision or machine learning is required, familiarity with basic Python programming is a must.

What you will learn

  • Install and familiarize yourself with OpenCV 4 s Python 3 bindings
  • Understand image processing and video analysis basics
  • Use a depth camera to distinguish foreground and background regions
  • Detect and identify objects, and track their motion in videos
  • Train and use your own models to match images and classify objects
  • Detect and recognize faces, and classify their gender and age
  • Build an augmented reality application to track an image in 3D
  • Work with machine learning models, including SVMs, artificial neural networks (ANNs), and deep neural networks (DNNs)
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Feb 20, 2020
Length: 372 pages
Edition : 3rd
Language : English
ISBN-13 : 9781789531619
Vendor :
Intel
Category :
Languages :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Publication date : Feb 20, 2020
Length: 372 pages
Edition : 3rd
Language : English
ISBN-13 : 9781789531619
Vendor :
Intel
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 164.97
Mastering OpenCV 4 with Python
$48.99
Modern Computer Vision with PyTorch
$65.99
Learning OpenCV 4 Computer Vision with Python 3
$49.99
Total $ 164.97 Stars icon
Banner background image

Table of Contents

11 Chapters
Setting Up OpenCV Chevron down icon Chevron up icon
Handling Files, Cameras, and GUIs Chevron down icon Chevron up icon
Processing Images with OpenCV Chevron down icon Chevron up icon
Depth Estimation and Segmentation Chevron down icon Chevron up icon
Detecting and Recognizing Faces Chevron down icon Chevron up icon
Retrieving Images and Searching Using Image Descriptors Chevron down icon Chevron up icon
Building Custom Object Detectors Chevron down icon Chevron up icon
Tracking Objects Chevron down icon Chevron up icon
Camera Models and Augmented Reality Chevron down icon Chevron up icon
Introduction to Neural Networks with OpenCV Chevron down icon Chevron up icon
Other Book You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.1
(14 Ratings)
5 star 50%
4 star 21.4%
3 star 14.3%
2 star 14.3%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Anonymous Joe (electrical engineer) May 13, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Arrived in perfect condition. I Suggest this book to anyone learning open CV with python
Amazon Verified review Amazon
Rz Feb 22, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Lavoro nel settore CV da anni e volevo verificare l'efficacia di Python per OpenCV, che ho usato ma sempre trovato macchinose nella versione C++. Ho iniziato da questo testo e ne sono entusiasta. A differenza di altri testi (anche della stessa collana) il testo espone l'uso di OpenCV 4 con Python in modo chiaro e esplicativo, illustrando le varie possibilità a complessità crescente con spiegazione di ogni riga di codice, funzioni e relativi parametri. Esempi funzionanti scaricati dal sito, una volta tanto senza sforzo. Consigliatissimo e anche divertente.
Amazon Verified review Amazon
Riley Tallman Apr 28, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have taken two graduate level courses in image processing at Arizona State University which have taught me many of the concepts in this book like depth estimation, edge detectors, and convolutional filters. The book briefly describes the concepts of the algorithms and then shows the code necessary to perform them. It is a perfect supplement for an intermediate user who is aware of some image processing techniques/concepts but has never performed them with code, like myself.I highly recommend this book."But can't I just look up tutorial online about OpenCV?"You certainly can and you will probably find what you are looking for. However, when searching online you have to know what you are looking for. With this book, the information comes to you rather than you seeking it out. There are so many algorithms that I did not know about, like the "Harris corner detection algorithm" that I would never come across if it weren't for this book.
Amazon Verified review Amazon
Sinjin Sep 07, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I spent several weeks researching OpenCV books, ordered a few, and was terribly disappointed. Then I stumbled onto this gem. It covers all the major topics under the OpenCV umbrella and does so succinctly and in a very readable manner. In fact, I read the whole book cover to cover without touching the code. This is almost impossible with most computing books, but the authors have a very approachable writing and teaching style. The only place they really slipped up was in the descriptions of SIFT, SURF, FAST, and similar algorithms. Those sections were pretty confusing and unclear.One thing I appreciated was that the authors take time to build a suite of modules, like you'd need in real life applications. More than just providing code, they provide guidance on when to make separate modules and why (generally to facilitate expansion of the program later). I was particularly interested in detecting custom objects, and had become very frustrated with the approaches in other books and in online tutorials. The authors' approach is straightforward and easy to follow, and I am currently applying it. Unlike other Packt books, I had no problem downloading the book's code, however, I did run into some difficulty finding and downloading some of the public training datasets (such as for detecting cars). Despite this, the book is still head-and-shoulders above other OpenCV resources I have attempted to use. If you are just getting started with OpenCV, do yourself a favor and get a copy of Learning OpenCV.
Amazon Verified review Amazon
roudan Jan 24, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is a great book to learn opencv library and image processing, detection etc. It makes me interested in computer vision. My favorite is the python codes comes with the book. They all work and I can easily use the codes for my application. so the book is definitely worthy! Thanks Joseph for this great book, I thorough enjoy it.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela