Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
The Computer Vision Workshop
The Computer Vision Workshop

The Computer Vision Workshop: Develop the skills you need to use computer vision algorithms in your own artificial intelligence projects

Arrow left icon
Profile Icon Hafsa Asad Profile Icon Vishwesh Ravi Shrimali Profile Icon Nikhil Singh
Arrow right icon
$9.99 $38.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3 (6 Ratings)
eBook Jul 2020 568 pages 1st Edition
eBook
$9.99 $38.99
Paperback
$87.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Hafsa Asad Profile Icon Vishwesh Ravi Shrimali Profile Icon Nikhil Singh
Arrow right icon
$9.99 $38.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3 (6 Ratings)
eBook Jul 2020 568 pages 1st Edition
eBook
$9.99 $38.99
Paperback
$87.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$9.99 $38.99
Paperback
$87.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

The Computer Vision Workshop

2. Common Operations When Working with Images

Overview

In this chapter, we will take a look at geometric transformations – rotation, translation, scaling, affine transformation, and perspective transformation. We will crop images using NumPy and OpenCV functions. Then, we will discuss binary images and how to carry out arithmetic operations on images. We will have a look at some real-life applications where these techniques can come in handy.

By the end of this chapter, you should have a fairly good idea of how to process images for your specific case study. You will be able to use affine transformations on images and carry out tasks such as motion identification using binary operations.

Introduction

If you've ever seen a "behind-the-scenes" video of your favorite Hollywood movie, you might have noticed that some of the animated scenes are shot in front of a green screen. Then, during editing, the green screen is replaced with a mind-blowing futuristic vista or a dystopian scene that's beyond your wildest imagination.

Now, think of one single frame from that scene in the movie. The original frame was shot in front of a green background. Then, the image was modified to obtain a particular result, that is, a different background.

Furthermore, sometimes, you want to process an image to directly obtain the result you seek so that you can receive an intermediate result that will make further steps easier and achievable. The following is a sample picture that was taken in front of the green screen:

Figure 2.1: A picture taken in front of a green screen

Figure 2.1: A picture taken in front of a green screen

This activity of modifying or processing an image to obtain a particular result is known as image processing.

In the previous chapter, we went over the basics of images – what pixels are, what pixel coordinates are, how to extract pixel values using pixel coordinates, and more. We'll start this chapter by understanding what we mean by image processing and why we need it. By the end of this chapter, you will be able to process images such as the one shown previously and replace the green screen with a background of your choice using a very basic image processing technique referred to as masking.

This chapter can be broken down into two major parts. First, we will focus on basic techniques, such as translation, rotation, resizing, and cropping, and a more general geometric transformation called affine transformation. We will end the section by looking at perspective transformation. Then, we will discuss binary images and arithmetic operations that we can carry out on images. We will also talk about masking in this section.

Both sections will be accompanied by exercises and activities that you can try to complete. Now, let's start with the first section of this chapter – Geometric Transformations.

Geometric Transformations

Often, during image processing, the geometry of the image (such as the width and height of the image) needs to be transformed. This process is called geometric transformation. As we saw in the Images in OpenCV section of Chapter 1, Basics of Image Processing, images are nothing but matrices, so we can use a more mathematical approach to understand these topics.

Since images are matrices, if we apply an operation to the images (matrices) and we end up with another matrix, we call this a transformation. This basic idea will be used extensively to understand and apply various kinds of geometric transformations.

Here are the geometric transformations that we are going to discuss in this chapter:

  • Translation
  • Rotation
  • Image scaling or resizing
  • Affine transformation
  • Perspective transformation

While discussing image scaling, we will also discuss image cropping. which isn't actually a geometric transformation, though it is commonly used along with resizing.

First, we will go over image translation, where we will discuss moving an image in a specified direction.

Image Translation

Image translation is also referred to as shifting an image. In this transformation, our basic intention is to move an image along a line. Take a look at the following figure:

Figure 2.2: Image translation

Figure 2.2: Image translation

For example, if we consider the following figure, we are shifting the human sketch to the right. We can shift an image in both the X and Y directions, as well as separately. What we are doing in this process is shifting every pixel's location in the image in a certain direction. Consider the following figure:

Figure 2.3: Pixel view of image translation

Figure 2.3: Pixel view of image translation

Using the preceding figure as reference, we can represent image translation with the following matrix equation:

Figure 2.4: Representation of an image matrix

Figure 2.4: Representation of an image matrix

In the preceding equation, x' and y' represent the new coordinates of a point/pixel after it has been shifted by a units in the x direction and b units in the y direction.

Note

Even though we have discussed image translation, it is very rarely used in image processing. It is usually combined with other transformations to create a more complex transformation. We will be studying this general class of transformations in the Affine Transformations section.

Now that we have discussed the theory of image translation, let's understand how translation can be performed using NumPy. We stated that images are nothing but NumPy arrays while using them in OpenCV in Python. Similarly, geometric transformations are just matrix multiplications. We also saw how image translation can be represented using a matrix equation. In terms of matrix multiplication, the same image equation can be written as a NumPy array:

M = np.array([[1,0,a],[0,1,b],[0,0,1])

If you look carefully, you will notice that the first two columns of this matrix form an identity matrix ([[1,0],[0,1]]). This signifies that even though we are transforming the image, the image dimensions (width and height) will stay the same. The last column is made up of a and b, which signifies that the image has been shifted by a units in the x direction and b units in the y direction. The last row, - [0,0,1], is only used to make the matrix, M, a square matrix – a matrix with the same number of rows and columns. Now that we have the matrix for image translation with us, whenever we want to shift an image, all we need to do is multiply the image, let's say, img (as a NumPy array), with the array, M. This can be performed as follows:

output = M@img

Here, output is the output that's obtained after image translation.

Note that in the preceding code, it's assumed that every point that we want to shift represents a column in the matrix, that is, img. Moreover, usually, an extra row full of ones is also added in the img matrix. That's simply to make sure that that matrix multiplication can be carried out. To understand it, let's understand the dimensions of the matrix, M. It has three rows and three columns. For the matrix multiplication of M and img to be carried out, img should have the same number of rows as the number of columns in matrix M, that is, three.

Now, we also know that a point is represented as a column in the img matrix. We already have the x coordinate of the point in the first row and the y coordinate of the point in the second row of the img matrix. To make sure that we have three rows in the img matrix, we add 1 in the third row.

For example, point [2,3] will be presented as [[2],[3],[1]], that is, 2 is present in the first row, 3 is present in the second row. and 1 is present in the third row.

Similarly, if we want to represent two points – [2,3] and [1,0] – they will be represented as [[2,1],[3,0],[1,1]]; that is, the x coordinates of both points (2 and 1) are present in the first row, the y coordinates (3 and 0) are present in the second row, and the third row is made up of 1s.

Let's understand this better with the help of an exercise.

Exercise 2.01: Translation Using NumPy

In this exercise, we will carry out the geometric transformation known as translation using NumPy. We will be using the translation matrix that we looked at earlier to carry out the translation. We will be shifting 3 points by 2 units in the x direction and 3 units in the y direction. Let's get started:

  1. Create a new Jupyter Notebook – Exercise2.01.ipynb. We will be writing our code in this notebook.
  2. First, let's import the NumPy module:
    # Import modules
    import numpy as np
  3. Next, we will specify the three points that we want to shift – [2,3], [0,0], and [1,2]. Just as we saw previously, the points will be represented as follows:
    # Specify the 3 points that we want to translate
    points = np.array([[2,0,1],
                      [3,0,2],
                      [1,1,1]])

    Note how the x coordinates (2, 0, and 1) make up the first row, the y coordinates (3, 0, and 2) make up the second row, and the third row has only 1s.

  4. Let's print the points matrix:
    print(points)

    The output is as follows:

    [[2 0 1]
     [3 0 2]
     [1 1 1]]
  5. Next, let's manually calculate the coordinates of these points after translation:
    # Output points - using manual calculation
    outPoints = np.array([[2 + 2, 0 + 2, 1 + 2],
                [3 + 3, 0 + 3, 2 + 3],
                [1, 1, 1]])
  6. Let's also print the outPoints matrix:
    print(outPoints)

    The output is as follows:

    [[4 2 3]
     [6 3 5]
     [1 1 1]]
  7. We need to shift these points by 2 units in the x direction and 3 units in the y direction:
    # Move by 2 units along x direction
    a = 2
    # Move by 3 units along y direction
    b = 3
  8. Next, we will specify the translation matrix, as we saw previously:
    # Translation matrix
    M = np.array([[1,0,a],[0,1,b],[0,0,1]])
  9. Now, we can print the translation matrix:
    print(M)

    The output is as follows:

    [[1 0 2]
     [0 1 3]
     [0 0 1]]
  10. Now, we can perform the translation using matrix multiplication:
    # Perform translation using NumPy
    output = M@points
  11. Finally, we can display the output points by printing them:
    print(output)

    The output is as follows:

    [[4 2 3]
     [6 3 5]
     [1 1 1]]

Notice how the output points obtained using NumPy and using manual calculation match perfectly. In this exercise, we shifted the given points using manual calculation, as well as using matrix multiplication. We also saw how we can carry out matrix multiplication using the NumPy module in Python. The key point to focus on here is that transition is nothing but a matrix operation, just as we discussed previously at the beginning of the Geometric Transformations section.

Note

To access the source code for this specific section, please refer to https://packt.live/3eQdjhR.

So far, we've discussed moving an image along a direction. Next, let's discuss rotating an image around a specified point at a given angle.

Image Rotation

Image rotation, as the name suggests, involves rotating an image around a point at a specified angle. The most common notation is to specify the angle as positive if it's anti-clockwise and negative if it's specified in the clockwise direction. We will assume angles in a non-clockwise direction are positive. Let's see whether we can derive the matrix equation for this. Let's consider the case shown in the following diagram: w and h refer to the height and width of an image, respectively. The image is rotated about the origin (0,0) at an angle, alpha (α), in an anti-clockwise direction:

Figure 2.5: Image rotation

Figure 2.5: Image rotation

One thing that we can note very clearly here is that the distance between a point and the origin stays the same, even after rotation. We will be using this point when deriving the rotation matrix.

We can divide the entire problem into two parts:

  • Finding the rotation matrix
  • Finding the size of the new image

Finding the Rotation Matrix

Let's consider a point, P(x,y), which, after rotation by angle β around the origin, O(0,0), is transformed into P'(x',y'). Let's also assume the distance between point P and origin O is r:

Figure 2.6: Point rotation

Figure 2.6: Point rotation

In the following equation, we have used the distance formula to obtain the distance between P and O:

Figure 2.7: Distance formula

Figure 2.7: Distance formula

Since we know that this distance will stay the same, even after rotation, we can write the following equation:

Figure 2.8: Distance formula after rotation

Figure 2.8: Distance formula after rotation

Let's assume that the P(x,y) point makes an angle, α, with the X-axis. Here, the formula will look as follows:

x = r cos(α)
y = r sin(α)

Similarly, the P'(x',y') point will make an angle, α + β, with the X-axis. Thus, the formula will look as follows:

x' = r cos(α + β)
y› = r sin(α + β)

Next, we will use the following trigonometric identity:

cos(α + β) = cos α cos β – sin α sin β

Thus, we can now simplify the equation for x', as follows:

Figure 2.9: Simplified equation for x’

Figure 2.9: Simplified equation for x'

Similarly, let's simplify the equation for y' using the following trigonometric identity:

sin(α + β) = sin α cos β – cos α sin β

Using the preceding equation, we can simplify y' as follows:

Figure 2.10: Simplified equation for y’

Figure 2.10: Simplified equation for y'

We can now represent x' and y' using the following matrix equation:

Figure 2.11: Representing x’ and y’ using the matrix equation

Figure 2.11: Representing x' and y' using the matrix equation

Now that we have obtained the preceding equation, we can transform any point into a new point if it's rotated by a given angle. The same equation can be applied to every pixel in an image to obtain the rotated image.

But have you ever seen a rotated image? Even though it's rotated, it still lies within a rectangle. This means the dimensions of the new image can change, unlike in translation, where the dimensions of the output image and the input image stayed the same. Let's see how we can find the dimensions of the image.

Finding the Size of the Output Image

We will consider two cases here. The first case is that we keep the dimensions of the output image the same as the ones for the input image, while the second case is that we modify the dimensions of the output image. Let's understand the difference between them by looking at the following diagram. The left half shows the case when the image dimensions are kept the same, even after rotation, whereas in the right half of the diagram, we relax the dimensions to cover the entire rotated image. You can see the difference in the results obtained in both cases. In the following diagram, L and H refer to the dimensions of the original image, while L' and H' refer to the dimensions after rotation.

The image is rotated by an angle, ϴ, in an anti-clockwise direction, around the center of the image:

Figure 2.12: Size of the rotated image

Figure 2.12: Size of the rotated image

In Figure 2.12, the size of the rotated image is presented depending on whether the image dimensions are kept the same or are modified while rotating the image

For the case where we want to keep the size of the image the same as the initial image's size, we just crop out the extra region. Let's learn how to obtain the size of the rotated image, if we don't want to keep the dimensions the same:

Figure 2.13: Size of the image after rotation

Figure 2.13: Size of the image after rotation

In the preceding diagram, the original image, ABCD, has been rotated along the center of the image by an angle, θ, to form the new image A'B'C'D'.

Referring to the preceding diagram, we can write the following equations:

A'Q = L cos θ
A›P = H sin θ

Thus, we can obtain L' in terms of L and H as follows. Note that L' refers to the size of the image after rotation. Refer to the preceding diagram for a visual representation:

L' = PQ = PA' + A'Q = H sin θ + L cos θ

Similarly, we can write the following equations:

PD' = H cos θ
D›S = L sin θ

Thus, we can obtain H' in terms of L and H, as follows:

H' = PS = PD' + D'S = L sin θ + H cos θ

Here comes a tricky part. We know the values of cosine and sine can also be negative, but the size of the rotated image cannot be smaller than the size of the input image. Thus, we will use the absolute values of sine and cosine. The equations for L' and H' will be modified so that they look as follows:

L' = L|cos θ| + H | sin θ|
H› = L|sin θ| + H | cos θ|

We can round off these values to obtain an integer value for the new width and height of the rotated image.

Now comes the question that you might be wanting to ask by now. Do we have to do all this calculation every time we want to rotate an image? Luckily, the answer is no. OpenCV has a function that we can use if we want to rotate an image by a given angle. It also gives you the option of scaling the image. The function we are talking about here is the cv2.getRotationMatrix2D function. It generates the rotation matrix that can be used to rotate the image. This function takes three arguments:

  • The point that we want to rotate the image around. Usually, the center of the image or the bottom-left corner is chosen as the point.
  • The angle in degrees that we want to rotate the image by.
  • The factor that we want to change the dimensions of the image by. This is an optional argument and can be used to shrink or enlarge the image. In the preceding case, it was 1 as we didn't resize the image.

Note that we are only generating the rotation matrix here and not actually carrying out the rotation. We will cover how to use this matrix when we will look at affine transformations. For now, let's have a look at image resizing.

Image Resizing

Imagine that you are training a deep learning model or using it to carry out some prediction – for example, object detection, image classification, and so on. Most deep learning models (that are used for images) require the input image to be of a fixed size. In such situations, we resize the image to match those dimensions. Image resizing is a very simple concept where we modify an image's dimensions. This concept has its applications in almost every computer vision domain (image classification, face detection, face recognition, and so on).

Images can be resized in two ways:

  • Let's say that we have the initial dimensions as W×H, where W and H stand for width and height, respectively. If we want to double the size (dimensions) of the image, we can resize or scale the image up to 2W×2H. Similarly, if we want to reduce the size (dimensions) of the image by half, then we can resize or scale the image down to W/2×H/2. Since we just want to scale the image, we can pass the scaling factors (for length and width) while resizing. The output dimensions can be calculated based on these scaling factors.
  • We might want to resize an image to a fixed dimension, let's say, 420×360 pixels. In such a situation, scaling won't work as you can't be sure that the initial dimensions are going to be a multiple (or factor) of the fixed dimension. This requires us to pass the new dimensions of the image directly while resizing.

A very interesting thing happens when we resize an image. Let's try to understand this with the help of an example:

Figure 2.14: Image to resize

Figure 2.14: Image to resize

The preceding figure shows the image and the pixel values that we want to resize. Currently, it's of size 5×5. Let's say we want to double the size. This results in the following output. However, we want to fill in the pixel values:

Figure 2.15: Resized image

Figure 2.15: Resized image

Let's look at the various options we have. We can always replicate the pixels. This will give us the result shown in the following figure:

Figure 2.16: Resized image by replicating pixel values

Figure 2.16: Resized image by replicating pixel values

If we remove the pixel values from the preceding image, we will obtain the image shown in the following figure. Compare this with the original image, Figure 2.14. Notice how similar it looks to the original image:

Figure 2.17: Resized image without annotation

Figure 2.17: Resized image without annotation

Similarly, if we want to reduce the size of the image by half, we can drop some pixels. One thing that you would have noticed is that while resizing, we are replicating the pixels, as shown in Figure 2.16 and Figure 2.17. There are some other techniques we can use as well. We can use interpolation, in which we find out the new pixel values based on the pixel values of the neighboring pixels, instead of directly replicating them. This gives a nice smooth transition in colors. The following figure shows how the results vary if we use different interpolations. From the following figure, we can see that as we progress from left to right, the pixel values of the newly created pixels are calculated differently. In the first three images, the pixels are directly copied from the neighboring pixels, whereas in the later images, the pixel values depend on all the neighboring pixels (left, right, up, and down) and, in some cases, the diagonally adjacent pixels as well:

Figure 2.18: Different results obtained based on different interpolations used

Figure 2.18: Different results obtained based on different interpolations used

Now, you don't need to worry about all the interpolations that are available. We will stick to the following three interpolations for resizing:

  • If we are shrinking an image, we will use bilinear interpolation. This is represented as cv2.INTER_AREA in OpenCV.
  • If we are increasing the size of an image, we will use either linear interpolation (cv2.INTER_LINEAR) or cubic interpolation (cv2.INTER_CUBIC).

Let's look at how we can resize an image using OpenCV's cv2.resize function:

cv2.resize(src, dsize, fx, fy, interpolation)

Let's have a look at the arguments of this function:

  • src: This argument is the image we want to resize.
  • dsize: This argument is the size of the output image (width, height). This will be used when you know the size of the output image. If we are just scaling the image, we will set this to None.
  • fx and fy: These arguments are the scaling factors. These will only be used when we want to scale the image. If we already know the size of the output image, we will skip these arguments. These arguments are specified as fx = 5, fy = 3, if we want to scale the image by five in the x direction and three in the y direction.
  • interpolation: This argument is the interpolation that we want to use. This is specified as interpolation = cv2.INTER_LINEAR if we want to use linear interpolation. The other interpolation flags have already been mentioned previously – cv2.INTER_AREA and cv2.INTER_CUBIC.

Next, let's have a look at a more general transformation – affine transformation.

Affine Transformation

Affine transformation is one of the most important geometric transformations in computer vision. The reason for this is that an affine transformation can combine the effects of translation, rotation, and resizing into one transform. Affine transformation in OpenCV uses a 2×3 matrix and then applies the effect using the matrix using the cv2.warpAffine function. This function takes three arguments:

cv2.warpAffine(src, M, dsize)

Let's take a look at each argument:

  • The first argument (src) is the image that we want to apply the transformation to.
  • The second argument (M) is the transformation matrix.
  • The third argument (dsize) is the shape of the output image. The order that's used is (width, height) or (number of columns, number of rows).

Let's have a look at the transformation matrices for different transformations and how to generate them:

Figure 2.19: Table with transformation matrices for different transformations

Figure 2.19: Table with transformation matrices for different transformations

To generate the transformation matrix for affine transformation, we choose any three non-collinear points on the input image and the corresponding points on the output image. Let's call the points (in1x, in1y), (in2x, in2y), (in3x, in3y) for the input image, and (out1x, out1y), (out2x, out2y), (out3x, out3y) for the output image.

Then, we can use the following code to generate the transformation matrix. We can use the following code to create a NumPy array for storing the points:

ptsInput = np.float32([[in1x, in1y],[in2x, in2y],\
                      [in3x, in3y]])

Alternatively, we can also use the following code:

ptsOutput = np.float32([[out1x, out1y], [out2x, out2y], \
                       [out3x, out3y]])

Next, we will pass these two NumPy arrays to the cv2.getAffineTransform function, as follows:

M = cv2.getAffineTransform(ptsInput, ptsOutput)

Now that we have seen how to generate the transformation matrix, let's see how to apply it. This can be done using outputImage = cv2.warpAffine(inputImage, M, (outputImageWidth, outputImageHeight)).

Now that we have discussed affine transformation and how to apply it, let's put this knowledge to use in the following exercise.

Exercise 2.02: Working with Affine Transformation

In this exercise, we will use the OpenCV functions that we discussed in the previous sections to translate, rotate, and resize an image. We will be using the following image in this exercise.

Figure 2.20: Image of a drop

Figure 2.20: Image of a drop

Note

This image is available at https://packt.live/3geu9Hh.

Follow these steps to complete this exercise:

  1. Create a new Jupyter Notebook – Exercise2.02.ipynb. We will be writing our code in this notebook.
  2. Start by importing the required modules:
    # Import modules
    import cv2
    import numpy as np
    import matplotlib.pyplot as plt
    %matplotlib inline
  3. Next, read the image.

    Note

    Before proceeding, be sure to change the path to the image (highlighted) based on where the image is saved in your system.

    The code is as follows:

    img = cv2.imread("../data/drip.jpg")
  4. Display the image using Matplotlib:
    plt.imshow(img[:,:,::-1])
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.21: Image with X and Y axes

    Figure 2.21: Image with X and Y axes

  5. Convert the image into grayscale for ease of use:
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  6. Store the height and width of the image:
    height,width = img.shape
  7. Start with translation. Shift the image by 100 pixels to the right and 100 pixels down:
    # Translation
    tx = 100
    ty = 100
    M = np.float32([[1,0,tx],[0,1,ty]])
    dst = cv2.warpAffine(img,M,(width,height))
    plt.imshow(dst,cmap="gray")
    plt.show()

    The preceding code produces the following output. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.22: Translated image

    Figure 2.22: Translated image

  8. Next, rotate our image 45 degrees anti-clockwise, around the center of the image, and scale it up twice:
    # Rotation
    angle = 45
    center = (width//2, height//2)
    scale = 2
    M = cv2.getRotationMatrix2D(center,angle,scale)
    dst = cv2.warpAffine(img,M,(width,height))
    plt.imshow(dst,cmap="gray")
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.23: Rotated image

    Figure 2.23: Rotated image

  9. Finally, double the size of the image using the cv2.resize function:
    # Resizing image
    print("Width of image = {}, Height of image = {}"\
          .format(width, height))
    dst = cv2.resize(img, None, fx=2, fy=2, \
                     interpolation=cv2.INTER_LINEAR)
    height, width = dst.shape
    print("Width of image = {}, Height of image = {}"\
          .format(width, height))

    The output is as follows:

    Width of image = 1280, Height of image = 849
    Width of image = 2560, Height of image = 1698

In this exercise, we learned how to translate, rotate, and resize an image using OpenCV functions.

Note

To access the source code for this specific section, please refer to https://packt.live/38iUnFZ.

Next, let's have a look at the final geometric transformation of this chapter – perspective transformation.

Perspective Transformation

Perspective transformation, unlike all the transformations we have seen so far, requires a 3×3 transformation matrix. We won't go into the mathematics behind this matrix as it's out of the scope of this book. We will need four points in the input image and the coordinates of the same points in the output image. Note that these points should not be collinear. Next, similar to the affine transformation steps, we will carry out perspective transformation.

We will use the following code create a NumPy array for storing the points:

ptsInput = np.float32([[in1x, in1y],[in2x, in2y],[in3x, in3y],\
                      [in4x,in4y]]) 

Alternatively, we can use the following code:

ptsOutput = np.float32([[out1x, out1y], [out2x, out2y], \
                       [out3x, out3y], [out4x, out4y]])

Next, we will pass these two NumPy arrays to the cv2.getPerspectiveTransform function:

M = cv2.getPerspectiveTransform(ptsInput, ptsOutput)

The transformation matrix can then be applied using the following code:

outputImage = cv2.warpPerspective(inputImage, M, \
              (outputImageWidth, outputImageHeight))

The primary difference between perspective and affine transformation is that in perspective transformation, straight lines will still remain straight after the transformation, unlike in affine transformation, where, because of multiple transformations, the line can be converted into a curve.

Let's see how we can use perspective transformation with the help of an exercise.

Exercise 2.03: Perspective Transformation

In this exercise, we will carry out perspective transformation to obtain the front cover of the book given as the input image. We will use OpenCV's cv2.getPerspectiveTransform and cv2.warpPerspective functions. We will be using the following image of a book:

Figure 2.24: Image of a book

Figure 2.24: Image of a book

Note

This image can be downloaded from https://packt.live/2YM9tAA.

Follow these steps to complete this exercise:

  1. Create a new Jupyter Notebook – Exercise2.03.ipynb. We will be writing our code in this notebook.
  2. Import the modules we will be using:
    # Import modules
    import cv2
    import numpy as np
    import matplotlib.pyplot as plt
    %matplotlib inline
  3. Next, read the image.

    Note

    Before proceeding, be sure to change the path to the image (highlighted) based on where the image is saved in your system.

    The code is as follows:

    img = cv2.imread("../data/book.jpg")
  4. Display the image using Matplotlib, as follows:
    plt.imshow(img[:,:,::-1])
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.25: Output image of a book

    Figure 2.25: Output image of a book

  5. Specify four points in the image so that they lie at the four corners of the front cover. Since we just want the front cover, the output points will be nothing but the corner points of the final 300×300 image. Note that the order of the points should stay the same for both the input and output points:
    inputPts = np.float32([[4,381],
                          [266,429],
                          [329,68],
                          [68,20]])
    outputPts = np.float32([[0,300],
                           [300,300],
                           [300,0],
                           [0,0]])
  6. Next, obtain the transformation matrix:
    M = cv2.getPerspectiveTransform(inputPts,outputPts)
  7. Apply this transformation matrix to carry out perspective transformation:
    dst = cv2.warpPerspective(img,M,(300,300))
  8. Display the resultant image using Matplotlib. The X and Y axes refer to the width and height of the image, respectively:
    plt.imshow(dst[:,:,::-1])
    plt.show()
    Figure 2.26: Front cover of the book

Figure 2.26: Front cover of the book

In this exercise, we saw how we can use geometric transformations to extract the front cover of the book provided in a given image. This can be interesting when we are trying to scan a document and we want to obtain a properly oriented image of the document after scanning.

Note

To access the source code for this specific section, please refer to https://packt.live/2YOqs5u.

With that, we have covered all the major geometric transformations. It's time to cover another important topic in image processing – image arithmetic.

Image Arithmetic

We know that images are nothing but matrices. This raises an important and interesting question. If we can carry out arithmetic operations on matrices, can we carry them out on images as well? The answer is a bit tricky. We can carry out the following operations on images:

  • Adding and subtracting two images
  • Adding and subtracting a constant value to/from image
  • Multiplying a constant value by an image

We can, of course, multiply two images if we can assume they're matrices, but in terms of images, multiplying two images does not make much sense, unless it's carried out pixel-wise.

Let's look at these operations, one by one.

Image Addition

We know that an image is made up of pixels and that the color of a pixel is decided by the pixel value. So, when we talk about adding a constant to an image, we mean adding a constant to the pixel values of the image. While it might seem a very simple task, there is a small concept that needs to be discussed properly. We have already said that, usually, the pixel values in images are represented using unsigned 8-bit integers, and that's why the range of pixel values is from 0 to 255. Now, imagine that you want to add the value 200 to an image, that is, to all the pixels of the image. If we have a pixel value of 220 in the image, the new value will become 220 + 200 = 420. But since this value lies outside of the range (0-255), two things can happen:

  • The value will be clipped to a maximum value. This means that 420 will be revised to 255.
  • The value will follow a cyclic order or a modulo operation. A modulo operation means that you want to find the remainder obtained after dividing one value by another value. We are looking at division by 255 here (the maximum value in the range). That's why 420 will become 165 (the remainder that's obtained after dividing 420 by 255).

Notice how the results obtained in both approaches are different. The first approach, which is also the recommended approach, uses OpenCV's cv2.add function. The usage of the cv2.add function is as follows:

dst = cv2.add(src1, src2)

The second approach uses NumPy while adding two images. The NumPy approach is as follows:

dst = src1 + src2

src1 refers to the first image that we are trying to add to the second image (src2). This gives the resultant image (dst) as the output. Either of these approaches can be used to add two images. The interesting part, as we will see shortly in the next exercise, is that when we add a constant value to an image using OpenCV, by default, that value gets added to the first channel of the image. Since we already know that images in OpenCV are represented in BGR mode, the value gets added to the blue channel, giving the image a more bluish tone. On the other hand, in the NumPy approach, the value is automatically broadcasted in such a manner that the value is added to every channel.

Let's compare both approaches with the help of an exercise.

Exercise 2.04: Performing Image Addition

In this exercise, we will learn how to add a constant value to an image and the approaches to carry out the task. We will be working on the following image. First, we will add a constant value to the image using the cv2.add function and then compare the output obtained with the output obtained by adding the constant value using NumPy:

Figure 2.27: Image of a Golden Retriever puppy

Figure 2.27: Image of a Golden Retriever puppy

Note

This image can be found at https://packt.live/2ZnspEU.

Follow these steps to complete this exercise:

  1. Create a new notebook – Exercise2.04.ipynb. We will be writing our code in this notebook.
  2. Let's import the libraries we will be using and also write the magic command:
    # Import libraries
    import cv2
    import numpy as np
    import matplotlib.pyplot as plt
    %matplotlib inline
  3. Read the image. The code for this is as follows:

    Note

    Before proceeding, be sure to change the path to the image (highlighted) based on where the image is saved in your system.

    # Read image
    img = cv2.imread("../data/puppy.jpg")
  4. Display the image using Matplotlib:
    # Display image
    plt.imshow(img[:,:,::-1])
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.28: Image of a Golden Retriever puppy

    Figure 2.28: Image of a Golden Retriever puppy

  5. First, try adding the value 100 to the image using NumPy:
    # Add 100 to the image
    numpyImg = img+100
  6. Display the image, as follows:
    # Display image
    plt.imshow(numpyImg[:,:,::-1])
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.29: Result obtained by adding 100 to the image using NumPy

    Figure 2.29: Result obtained by adding 100 to the image using NumPy

    Notice how adding the value 100 has distorted the image severely. This is because of the modulo operation being performed by NumPy on the new pixel values. This should also give you an idea of why NumPy's approach is not the recommended approach to use while dealing with adding a constant value to an image.

  7. Try doing the same using OpenCV:
    # Using OpenCV
    opencvImg = cv2.add(img,100)
  8. Display the image, as follows:
    # Display image
    plt.imshow(opencvImg[:,:,::-1])
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.30: Result obtained by adding 100 to the image using OpenCV

    Figure 2.30: Result obtained by adding 100 to the image using OpenCV

    As shown in the preceding figure, there is an increased blue tone to the image. This is because the value 100 has only been added to the first channel of the image, which is the blue channel.

  9. Fix this by adding an image to an image instead of adding a value to an image. Check the shape, as follows:
    img.shape

    The output is (924, 1280, 3).

  10. Now, create an image that's the same shape as the original image that has a fixed pixel value of 100. We are doing this because we want to add the value 100 to every channel of the original image:
    nparr = np.ones((924,1280,3),dtype=np.uint8) * 100
  11. Add nparr to the image and visualize the result:
    opencvImg = cv2.add(img,nparr)
    # Display image
    plt.imshow(opencvImg[:,:,::-1])
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.31: Result obtained by adding an image to another image

    Figure 2.31: Result obtained by adding an image to another image

  12. When the same image is added using OpenCV, we notice that the results that are obtained are the same as they were for NumPy:
    npImg = img + nparr
    # Display image
    plt.imshow(npImg[:,:,::-1])
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.32: New results obtained

Figure 2.32: New results obtained

The most interesting point to note from the results is that adding a value to an image using OpenCV results in increased brightness of the image. You can try subtracting a value (or adding a negative value) and see whether the reverse also true.

In this exercise, we added a constant value to an image and compared the outputs obtained by using the cv2.add function and by using NumPy's addition operator (+). We also saw that while using the cv2.add function, the value was added to the blue channel and that it can be fixed by creating an image that's the same shape as the input image, with all the pixels of the value equal to the constant value we want to add to the image.

Note

To access the source code for this specific section, please refer to https://packt.live/2BUOUsV.

In this section, we saw how to perform image addition using NumPy and OpenCV's cv2.add function. Next, let's learn how to multiply an image with a constant value.

Image Multiplication

Image multiplication is very similar to image addition and can be carried out using OpenCV's cv2.multiply function (recommended) or NumPy. OpenCV's function is recommended because of the same reason as we saw for cv2.add in the previous section.

We can use the cv2.multiply function as follows:

dst = cv2.Mul(src1, src2)

Here, src1 and src2 refer to the two images we are trying to multiply and dst refers to the output image obtained by multiplication.

We already know that multiplication is nothing but repetitive addition. Since we saw that image addition had an effect of increasing the brightness of an image, image multiplication will have the same effect. The difference here is that the effect will be manifold. Image addition is typically used when we want to make minute modifications in brightness. Let's directly see how to use these functions with the help of an exercise.

Exercise 2.05: Image Multiplication

In this exercise, we will learn how to use OpenCV and NumPy to multiply a constant value with an image and how to multiply two images.

Note

The image used in this exercise can be found at https://packt.live/2ZnspEU.

Follow these steps to complete this exercise:

  1. Create a new notebook – Exercise2.05.ipynb. We will be writing our code in this notebook.
  2. First, let's import the necessary modules:
    # Import libraries
    import cv2
    import numpy as np
    import matplotlib.pyplot as plt
    %matplotlib inline
  3. Next, read the image of the puppy.

    Note

    Before proceeding, be sure to change the path to the image (highlighted) based on where the image is saved in your system.

    The code for this is as follows:

    # Read image
    img = cv2.imread("../data/puppy.jpg")
  4. Display the output using Matplotlib, as follows:
    # Display image
    plt.imshow(img[:,:,::-1])
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.33: Image of a Golden Retriever puppy

    Figure 2.33: Image of a Golden Retriever puppy

  5. Multiply the image by 2 using the cv2.multiply function:
    cvImg = cv2.multiply(img,2)
  6. Display the image, as follows:
    plt.imshow(cvImg[:,:,::-1])
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.34: Result obtained by multiplying the image by 2 using OpenCV

    Figure 2.34: Result obtained by multiplying the image by 2 using OpenCV

    Can you think of why you obtained a high blue tone in the output image? You can use the explanation given for image addition in the previous exercise as a reference.

  7. Try to multiply the image using NumPy:
    npImg = img*2
  8. Display the image, as follows:
    plt.imshow(npImg[:,:,::-1])
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.35: Result obtained by multiplying the image by 2 using NumPy

    Figure 2.35: Result obtained by multiplying the image by 2 using NumPy

  9. Print the shape of the image, as follows:
    img.shape

    The shape is (924, 1280, 3).

  10. Now, let's create a new array filled with 2s that's the same shape as the original image:
    nparr = np.ones((924,1280,3),dtype=np.uint8) * 2
  11. Now, we will use this new array for multiplication purposes and compare the results that are obtained:
    cvImg = cv2.multiply(img,nparr)
    plt.imshow(cvImg[:,:,::-1])
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.36: Result obtained by multiplying two images using OpenCV

Figure 2.36: Result obtained by multiplying two images using OpenCV

Note

To access the source code for this specific section, please refer to https://packt.live/2BRos3p.

We know that multiplication is nothing but repetitive addition, so it makes sense to obtain a brighter image using multiplication since we obtained a brighter image using addition as well.

So far, we've discussed geometric transformations and image arithmetic. Now, let's move on to a slightly more advanced topic regarding performing bitwise operations on images. Before we discuss that, let's have a look at binary images.

Binary Images

So far, we have worked with images with one channel (grayscale images) and three channels (color images). We also mentioned that, usually, pixel values in images are represented as 8-bit unsigned integers and that's why they have a range from 0 to 255. But that's not always true. Images can be represented using floating-point values and also with lesser bits, which also reduces the range. For example, an image using 6-bit unsigned integers will have a range between 0 - (26-1) or 0 to 63.

Even though it's possible to use more or fewer bits, typically, we work with only two kinds of ranges – 0 to 255 for 8-bit unsigned integers and images that have only 0 and 1. The second category of images uses only two pixel values, and that's why they are referred to as binary images. Binary images need only a single bit to represent a pixel value. These images are commonly used as masks for selecting or removing a certain region of an image. It is with these images that bitwise operations are commonly used. Can you think of a place where you have seen binary images in real life?

You can find such black-and-white images quite commonly in QR codes. Can you think of some other applications of binary images? Binary images are extensively used for document analysis and even in industrial machine vision tasks. Here is a sample binary image:

Figure 2.37: QR code as an example of a binary image

Figure 2.37: QR code as an example of a binary image

Now, let's see how we can convert an image into a binary image. This technique comes under the category of thresholding. Thresholding refers to the process of converting a color image into a binary image. There is a wide range of thresholding techniques available, but here, we will focus only on a very simple thresholding technique – binary thresholding – since we are working with binary images.

The concept behind binary thresholding is very simple. You choose a threshold value and all the pixel values below and equal to the threshold are replaced with 0, while all the pixel values above the threshold are replaced with a specified value (usually 1 or 255). This way, you end up with an image that has only two unique pixel values, which is what a binary image is.

We can convert an image into a binary image using the following code:

# Set threshold and maximum value
thresh = 125
maxValue = 255
# Binary threshold
th, dst = cv2.threshold(img, thresh, maxValue, \
                        cv2.THRESH_BINARY)

In the preceding code, we first specified the threshold as 125 and then specified the maximum value. This is the value that will replace all the pixel values above the threshold. Finally, we used OpenCV's cv2.threshold function to perform binary thresholding. This function takes the following inputs:

  • The grayscale image that we want to perform thresholding on.
  • thresh: The threshold value.
  • maxValue: The maximum value, which will replace all pixel values above the threshold.
  • th, dst: The thresholding flag. Since we are performing binary thresholding, we will use cv2.THRESH_BINARY.

Let's implement what we've learned about binary thresholding.

Exercise 2.06: Converting an Image into a Binary Image

In this exercise, we will use binary thresholding to convert a color image into a binary image. We will be working on the following image of zebras:

Figure 2.38: Image of zebras

Figure 2.38: Image of zebras

Note

This image can be found at https://packt.live/2ZpQ07Z .

Follow these steps to complete this exercise:

  1. Create a new Jupyter Notebook – Exercise2.06.ipynb. We will be writing our code in this notebook.
  2. Import the necessary modules:
    # Import modules
    import cv2
    import numpy as np
    import matplotlib.pyplot as plt
    %matplotlib inline
  3. Read the image of the zebras and convert it into grayscale. This is necessary because we know that thresholding requires us to provide a grayscale image as an argument.

    Note

    Before proceeding, be sure to change the path to the image (highlighted) based on where the image is saved in your system.

    The code for this is as follows:

    img = cv2.imread("../data/zebra.jpg")
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  4. Display the grayscale image using Matplotlib:
    plt.imshow(img, cmap='gray')
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.39: Image in grayscale

    Figure 2.39: Image in grayscale

  5. Use the cv2.thresholding function and set the threshold to 150:
    # Set threshold and maximum value
    thresh = 150
    maxValue = 255
    # Binary threshold
    th, dst = cv2.threshold(img, thresh, maxValue, \
                            cv2.THRESH_BINARY)

    Note

    You can try playing around with the threshold value to obtain different results.

  6. Display the binary image we have obtained:
    plt.imshow(dst, cmap='gray')
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.40: Binary image

Figure 2.40: Binary image

Note

To access the source code for this specific section, please refer to https://packt.live/2VyYHfa.

In this exercise, we saw how to obtain a binary image using thresholding. Next, let's see how we can carry out bitwise operations on these images.

Bitwise Operations on Images

Let's start by listing the binary operations, along with their results. You must have read about these operations before, so we won't go into their details. The following table provides the truth tables for the bitwise operations as a quick refresher:

Figure 2.41: Bitwise operations and truth tables

Figure 2.41: Bitwise operations and truth tables

Let's see how we can use these functions with the help of an exercise.

Exercise 2.07: Chess Pieces

In this exercise, we will use the XOR operation to find the chess pieces that have moved using two images taken of the same chess game:

Figure 2.42: Two images of chess board

Figure 2.42: Two images of chess board

Note

These images can be found at https://packt.live/3fuxLoU.

Follow these steps to complete this exercise:

  1. Create a new notebook – Exercise2.07.ipynb. We will be writing our code in this notebook.
  2. Import the required modules:
    # Import modules
    import cv2
    import numpy as np
    import matplotlib.pyplot as plt
    %matplotlib inline
  3. Read the images of the board and convert them to grayscale.

    Note

    Before proceeding, be sure to change the path to the images (highlighted) based on where the images are saved in your system.

    The code for this is as follows:

    img1 = cv2.imread("../data/board.png")
    img2 = cv2.imread("../data/board2.png")
    img1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
    img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
  4. Display these images using Matplotlib:
    plt.imshow(img1,cmap="gray")
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.43: Grayscale version of the chess image

    Figure 2.43: Grayscale version of the chess image

  5. Plot the second image, as follows:
    plt.imshow(img2,cmap="gray")
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.44: Grayscale version of the chess image

    Figure 2.44: Grayscale version of the chess image

  6. Threshold both the images using a threshold value of 150 and a maximum value of 255:
    # Set threshold and maximum value
    thresh = 150
    maxValue = 255
    # Binary threshold
    th, dst1 = cv2.threshold(img1, thresh, maxValue, \
                             cv2.THRESH_BINARY)
    # Binary threshold
    th, dst2 = cv2.threshold(img2, thresh, maxValue, \
                             cv2.THRESH_BINARY)
  7. Display these binary images using Matplotlib:
    plt.imshow(dst1, cmap='gray')
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.45: Binary image

    Figure 2.45: Binary image

  8. Print the second image, as follows:
    plt.imshow(dst2, cmap='gray')
    plt.show()

    The output is as follows. The X and Y axes refer to the width and height of the image, respectively:

    Figure 2.46: Image after thresholding

    Figure 2.46: Image after thresholding

  9. Use bitwise XOR to find the pieces that have moved, as follows:
    dst = cv2.bitwise_xor(dst1,dst2)
  10. Display the result, as follows. The X and Y axes refer to the width and height of the image, respectively:
    plt.imshow(dst, cmap='gray')
    plt.show()

    The output is as follows:

    Figure 2.47: Result of the XOR operation

Figure 2.47: Result of the XOR operation

Notice that, in the preceding image, the four pieces that are present show the initial and final positions of the only two pieces that had changed their positions in the two images. In this exercise, we used the XOR operation to perform motion detection to detect the two chess pieces that had moved their positions after a few steps.

Note

To access the source code for this specific section, please refer to https://packt.live/2NHixQY.

Masking

Let's discuss one last concept related to binary images. Binary images are quite frequently used to serve as a mask. For example, consider the following image. We will be using an image of a disk:

Figure 2.48: Image of a disk

Figure 2.48: Image of a disk

After image thresholding, the mask will look as follows:

Figure 2.49: Binary mask

Figure 2.49: Binary mask

Let's see what happens when we apply masking to the image of the zebras that we worked with earlier:

Figure 2.50: Image of zebras

Figure 2.50: Image of zebras

The final image will look as follows:

Figure 2.51: Final image

Figure 2.51: Final image

Let's start with Figure 2.49. This image is a binary image of a disk after thresholding. Figure 2.50 shows the familiar grayscale image of zebras. When Figure 2.49 is used as a mask to only keep the pixels of Figure 2.50, where the corresponding pixels of Figure 2.50 are white, we end up with the result shown in Figure 2.51. Let's break this down. Consider a pixel, P, at location (x,y) in Figure 2.49. If the pixel, P, is white or non-zero (because zero refers to black), the pixel at location (x,y) in Figure 2.50 will be left as it is. If the pixel, P, was black or zero, the pixel at location (x,y) in Figure 2.50 will be replaced with 0. This refers to a masking operation since Figure 2.49 is covering Figure 2.50 as a mask and displaying only a few selected pixels. Such an operation can be easily carried out using the following code:

result = np.where(mask, image, 0)

Let's understand what is happening here. NumPy's np.where function says that wherever the mask (first argument) is non-zero, return the value of the image (second argument); otherwise, return 0 (third argument). This is exactly what we discussed in the previous paragraph. We will be using masks in Chapter 5, Face Processing in Image and Video, as well.

Now, it's time for you to try out the concepts that you have studied so far to replicate the result shown in Figure 2.51.

Activity 2.01: Masking Using Binary Images

In this activity, you will be using masking and other concepts you've studied in this chapter to replicate the result shown in Figure 2.51. We will be using image resizing, image thresholding, and image masking concepts to display only the heads of the zebras present in Figure 2.50. A similar concept can be applied to create nice portraits of photos where only the face of the person is visible and the rest of the region/background is blacked out. Let's start with the steps that you need to follow to complete this activity:

  1. Create a new notebook – Activity2.01.ipynb. You will be writing your code in this notebook.
  2. Import the necessary libraries – OpenCV, NumPy, and Matplotlib. You will also need to add the magic command to display images inside the notebook.
  3. Read the image titled recording.jpg from the disk and convert it to grayscale.

    Note

    This image can be found at https://packt.live/32c3pDK.

  4. Next, you will have to perform thresholding on this image. You can use a threshold of 150 and a maximum value of 255.
  5. The thresholded image should be similar to the one shown in Figure 2.49.
  6. Next, read the image of the zebras (titled zebras.jpg) and convert it to grayscale.

    Note

    This image can be found at https://packt.live/2ZpQ07Z.

  7. Before moving on to using NumPy's where command for masking, we need to check whether the images have the same size or not. Print the shapes of both images (zebras and disk).
  8. You will notice that the images have different dimensions. Resize both images to 1,280×800 pixels. This means that the width of the resized image should be 1,280 pixels and that the height should be 800 pixels. You will have to use the cv2.resize function for resizing. Use linear interpolation while resizing the images.
  9. Next, use NumPy's where command to only keep the pixels where the disk pixels are white. The other pixels should be replaced with black color.

By completing this activity, you will get an output similar to the following:

Figure 2.52: Zebra image

Figure 2.52: Zebra image

The result that we have obtained in this activity can be used in portrait photography, where only the subject of the image is highlighted and the background is replaced with black.

Note

The solution for this activity can be found via this link.

By completing this activity, you have learned how to use image resizing to change the shape of an image, image thresholding to convert a color image into a binary image, and bitwise operations to perform image masking. Notice how image masking can be used to "mask" or hide certain regions of an image and display only the remaining portion of the image. This technique is used extensively in document analysis in computer vision.

Summary

In this chapter, we started by understanding the various geometric transforms and their matrix representations. We also saw how we can use OpenCV's functions to carry out these transformations. Then, we performed arithmetic operations on images and saw how addition and multiplication with a constant can be understood as a brightening operation. Next, we discussed binary images and how they can be obtained using binary thresholding. Then, we discussed bitwise operations and how they can be carried out using OpenCV. Finally, we had a look at the concept of masking, where we used NumPy to obtain interesting results.

In the next chapter, we will jump into the world of histograms and discuss how we can use histogram matching and histogram equalization as more advanced image processing steps.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Understand OpenCV and select the right algorithm to solve real-world problems
  • Discover techniques for image and video processing
  • Learn how to apply face recognition in videos to automatically extract key information

Description

Computer Vision (CV) has become an important aspect of AI technology. From driverless cars to medical diagnostics and monitoring the health of crops to fraud detection in banking, computer vision is used across all domains to automate tasks. The Computer Vision Workshop will help you understand how computers master the art of processing digital images and videos to mimic human activities. Starting with an introduction to the OpenCV library, you'll learn how to write your first script using basic image processing operations. You'll then get to grips with essential image and video processing techniques such as histograms, contours, and face processing. As you progress, you'll become familiar with advanced computer vision and deep learning concepts, such as object detection, tracking, and recognition, and finally shift your focus from 2D to 3D visualization. This CV course will enable you to experiment with camera calibration and explore both passive and active canonical 3D reconstruction methods. By the end of this book, you'll have developed the practical skills necessary for building powerful applications to solve computer vision problems.

Who is this book for?

If you are a researcher, developer, or data scientist looking to automate everyday tasks using computer vision, this workshop is for you. A basic understanding of Python and deep learning will help you to get the most out of this workshop.

What you will learn

  • Access and manipulate pixels in OpenCV using BGR and grayscale images
  • Create histograms to better understand image content
  • Use contours for shape analysis, object detection, and recognition
  • Track objects in videos using a variety of trackers available in OpenCV
  • Discover how to apply face recognition tasks using computer vision techniques
  • Visualize 3D objects in point clouds and polygon meshes using Open3D

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 27, 2020
Length: 568 pages
Edition : 1st
Language : English
ISBN-13 : 9781800207141
Category :
Languages :
Concepts :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Jul 27, 2020
Length: 568 pages
Edition : 1st
Language : English
ISBN-13 : 9781800207141
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 180.97
The Reinforcement Learning Workshop
$48.99
The Deep Learning Workshop
$43.99
The Computer Vision Workshop
$87.99
Total $ 180.97 Stars icon
Banner background image

Table of Contents

8 Chapters
1. Basics of Image Processing Chevron down icon Chevron up icon
2. Common Operations When Working with Images Chevron down icon Chevron up icon
3. Working with Histograms Chevron down icon Chevron up icon
4. Working with contours Chevron down icon Chevron up icon
5. Face Processing in Image and Video Chevron down icon Chevron up icon
6. Object Tracking Chevron down icon Chevron up icon
7. Object Detection and Face Recognition Chevron down icon Chevron up icon
8. OpenVINO with OpenCV Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3
(6 Ratings)
5 star 50%
4 star 33.3%
3 star 16.7%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Luis René Mata Quiñonez Nov 06, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is amazing! It is a very complete guide for image processing implementation from basic to complex tasks. It is hands-on focused, explaining the exercises step by step and showing the Python code and links that show the source code. I truly recommend this book because it is very well structured and easy to follow. It is full of very useful tools that you can apply on your own projects.
Amazon Verified review Amazon
Bill Pang Jan 19, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
simple and easy steps to follow during the learning process.
Amazon Verified review Amazon
Rajeev Oct 01, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
What I loved about this book was that it didn't skimp on the basics, it covered many important topics that you don't see covered in most texts. The OpenCV areas with Object Tracking were well developed. Overall this is a perfect book for a College or University student looking to understand the basics. It will help tremendously when implementing your own projects. The Deep Learning areas don't go into detail but nevertheless cover what you need to know. I was quite fond of the OpenVino section as well. Overall, an excellent text on CV.
Amazon Verified review Amazon
Sachin Guruswamy Nov 22, 2020
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
The book is mainly on implementation. if you are looking for theoretical understanding this is not for you.If you want to learn implementing some simple projects learn how to write Image processing pipelines using opencv this is a really good book. it also covers openvino which helps if you are looking into EdgeAI too.
Amazon Verified review Amazon
nikesh Mar 23, 2021
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
The book The COMPUTER VISION WORKSHOP does a great job explaining the basics of images, color spaces, channels, pixels, processing, and transforming images. It then dives deeper into various topics such as face detection, occlusion, tracking objects in a video stream, facial recognition, object detection, etc. Within each topic, it provides a quick refresher on the libraries used, such as Numpy, Matplotlib, Open CV, OpenVINO, etc., and explains the topic using several algorithms, techniques, tools, and walkthrough exercises. It also provides recommendations on which tools work better for each task. For example, it explains why the deep learning approach is better than machine learning approaches in real-life applications for some of the computer vision tasks mentioned above. In some areas, the book can do a better job explaining topics, such as extracting channels of an image in different color spaces, PCA, and Deep learning approaches. Though it requires a basic understanding of working with a Jupiter notebook and Python, this book is an excellent resource for beginner to moderate level readers in Computer Vision
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.