OpenCV is a computer vision and machine learning library that has been developed for more than 20 years and provides an impressive number of functionalities. Despite some inconsistencies in the API, its simplicity and the remarkable number of algorithms implemented make it an extremely popular library and an excellent choice for many situations.
OpenCV is written in C++, but there are bindings for Python, Java, and Android.
In this book, we will focus on OpenCV for Python, with all the code tested using OpenCV 4.2.
OpenCV in Python is provided by opencv-python
, which can be installed using the following command:
pip install opencv-python
OpenCV can take advantage of hardware acceleration, but to get the best performance, you might need to build it from the source code, with different flags than the default, to optimize it for your target hardware.
OpenCV and NumPy
The Python bindings use NumPy, which increases the flexibility and makes it compatible with many other libraries. As an OpenCV image is a NumPy array, you can use normal NumPy operations to get information about the image. A good understanding of NumPy can improve the performance and reduce the length of your code.
Let's dive right in with some quick examples of what you can do with NumPy in OpenCV.
Image size
The size of the image can be retrieved using the shape
attribute:
print("Image size: ", image.shape)
For a grayscale image of 50x50, image.shape()
would return the tuple (50, 50), while for an RGB image, the result would be (50, 50, 3).
False friends
In NumPy, the attribute size is the size in bytes of the array; for a 50x50 gray image, it would be 2,500, while for the same image in RGB, it would be 7,500. It's the shape
attribute that contains the size of the image – (50, 50) and (50, 50, 3), respectively.
Grayscale images
Grayscale images are represented by a two-dimensional NumPy array. The first index affects the rows (y coordinate) and the second index the columns (x coordinate). The y coordinates have their origin in the top corner of the image and x coordinates have their origin in the left corner of the image.
It is possible to create a black image using np.zeros()
, which initializes all the pixels to 0:
black = np.zeros([100,100],dtype=np.uint8) # Creates a black image
The previous code creates a grayscale image with size (100, 100), composed of 10,000 unsigned bytes (dtype=np.uint8
).
To create an image with pixels with a different value than 0, you can use the full()
method:
white = np.full([50, 50], 255, dtype=np.uint8)
To change the color of all the pixels at once, it's possible to use the [:]
notation:
img[:] = 64 # Change the pixels color to dark gray
To affect only some rows, it is enough to provide a range of rows in the first index:
img[10:20] = 192 # Paints 10 rows with light gray
The previous code changes the color of rows 10-20, including row 10, but excluding row 20.
The same mechanism works for columns; you just need to specify the range in the second index. To instruct NumPy to include a full index, we use the [:]
notation that we already encountered:
img[:, 10:20] = 64 # Paints 10 columns with dark gray
You can also combine operations on rows and columns, selecting a rectangular area:
img[90:100, 90:100] = 0 # Paints a 10x10 area with black
It is, of course, possible to operate on a single pixel, as you would do on a normal array:
img[50, 50] = 0 # Paints one pixel with black
It is possible to use NumPy to select a part of an image, also called the Region Of Interest (ROI). For example, the following code copies a 10x10 ROI from the position (90, 90) to the position (80, 80):
roi = img[90:100, 90:100]
img[80:90, 80:90] = roi
The following is the result of the previous operations:
Figure 1.1 – Some manipulation of images using NumPy slicing
To make a copy of an image, you can simply use the copy()
method:
image2 = image.copy()
RGB images
RGB images differ from grayscale because they are three-dimensional, with the third index representing the three channels. Please note that OpenCV stores the images in BGR format, not RGB, so channel 0 is blue, channel 1 is green, and channel 2 is red.
Important note
OpenCV stores the images as BGR, not RGB. In the rest of the book, when talking about RGB images, it will only mean that it is a 24-bit color image, but the internal representation will usually be BGR.
To create an RGB image, we need to provide three sizes:
rgb = np.zeros([100, 100, 3],dtype=np.uint8)
If you were going to run the same code previously used on the grayscale image with the new RGB image (skipping the third index), you would get the same result. This is because NumPy would apply the same color to all the three channels, which results in a shade of gray.
To select a color, it is enough to provide the third index:
rgb[:, :, 2] = 255 # Makes the image red
In NumPy, it is also possible to select rows, columns, or channels that are not contiguous. You can do this by simply providing a tuple with the required indexes. To make the image magenta, you need to set the blue and red channels to 255
, which can be achieved with the following code:
rgb[:, :, (0, 2)] = 255 # Makes the image magenta
You can convert an RGB image into grayscale using cvtColor()
:
gray = cv2.cvtColor(original, cv2.COLOR_BGR2GRAY)