Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Hands-On Vision and Behavior for Self-Driving Cars
Hands-On Vision and Behavior for Self-Driving Cars

Hands-On Vision and Behavior for Self-Driving Cars: Explore visual perception, lane detection, and object classification with Python 3 and OpenCV 4

eBook
€32.99
Paperback
€41.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Table of content icon View table of contents Preview book icon Preview Book

Hands-On Vision and Behavior for Self-Driving Cars

Chapter 1: OpenCV Basics and Camera Calibration

This chapter is an introduction to OpenCV and how to use it in the initial phases of a self-driving car pipeline, to ingest a video stream, and prepare it for the next phases. We will discuss the characteristics of a camera from the point of view of a self-driving car and how to improve the quality of what we get out of it. We will also study how to manipulate the videos and we will try one of the most famous features of OpenCV, object detection, which we will use to detect pedestrians.

With this chapter, you will build a solid foundation on how to use OpenCV and NumPy, which will be very useful later.

In this chapter, we will cover the following topics:

  • OpenCV and NumPy basics
  • Reading, manipulating, and saving images
  • Reading, manipulating, and saving videos
  • Manipulating images
  • How to detect pedestrians with HOG
  • Characteristics of a camera
  • How to perform the camera calibration

Technical requirements

For the instructions and code in this chapter, you need the following:

  • Python 3.7
  • The opencv-Python module
  • The NumPy module

The code for the chapter can be found here:

https://github.com/PacktPublishing/Hands-On-Vision-and-Behavior-for-Self-Driving-Cars/tree/master/Chapter1

The Code in Action videos for this chapter can be found here:

https://bit.ly/2TdfsL7

Introduction to OpenCV and NumPy

OpenCV is a computer vision and machine learning library that has been developed for more than 20 years and provides an impressive number of functionalities. Despite some inconsistencies in the API, its simplicity and the remarkable number of algorithms implemented make it an extremely popular library and an excellent choice for many situations.

OpenCV is written in C++, but there are bindings for Python, Java, and Android.

In this book, we will focus on OpenCV for Python, with all the code tested using OpenCV 4.2.

OpenCV in Python is provided by opencv-python, which can be installed using the following command:

pip install opencv-python

OpenCV can take advantage of hardware acceleration, but to get the best performance, you might need to build it from the source code, with different flags than the default, to optimize it for your target hardware.

OpenCV and NumPy

The Python bindings use NumPy, which increases the flexibility and makes it compatible with many other libraries. As an OpenCV image is a NumPy array, you can use normal NumPy operations to get information about the image. A good understanding of NumPy can improve the performance and reduce the length of your code.

Let's dive right in with some quick examples of what you can do with NumPy in OpenCV.

Image size

The size of the image can be retrieved using the shape attribute:

print("Image size: ", image.shape)

For a grayscale image of 50x50, image.shape() would return the tuple (50, 50), while for an RGB image, the result would be (50, 50, 3).

False friends

In NumPy, the attribute size is the size in bytes of the array; for a 50x50 gray image, it would be 2,500, while for the same image in RGB, it would be 7,500. It's the shape attribute that contains the size of the image – (50, 50) and (50, 50, 3), respectively.

Grayscale images

Grayscale images are represented by a two-dimensional NumPy array. The first index affects the rows (y coordinate) and the second index the columns (x coordinate). The y coordinates have their origin in the top corner of the image and x coordinates have their origin in the left corner of the image.

It is possible to create a black image using np.zeros(), which initializes all the pixels to 0:

black = np.zeros([100,100],dtype=np.uint8)  # Creates a black image

The previous code creates a grayscale image with size (100, 100), composed of 10,000 unsigned bytes (dtype=np.uint8).

To create an image with pixels with a different value than 0, you can use the full() method:

white = np.full([50, 50], 255, dtype=np.uint8)

To change the color of all the pixels at once, it's possible to use the [:] notation:

img[:] = 64        # Change the pixels color to dark gray

To affect only some rows, it is enough to provide a range of rows in the first index:

img[10:20] = 192   # Paints 10 rows with light gray

The previous code changes the color of rows 10-20, including row 10, but excluding row 20.

The same mechanism works for columns; you just need to specify the range in the second index. To instruct NumPy to include a full index, we use the [:] notation that we already encountered:

img[:, 10:20] = 64 # Paints 10 columns with dark gray

You can also combine operations on rows and columns, selecting a rectangular area:

img[90:100, 90:100] = 0  # Paints a 10x10 area with black

It is, of course, possible to operate on a single pixel, as you would do on a normal array:

img[50, 50] = 0  # Paints one pixel with black

It is possible to use NumPy to select a part of an image, also called the Region Of Interest (ROI). For example, the following code copies a 10x10 ROI from the position (90, 90) to the position (80, 80):

roi = img[90:100, 90:100]
img[80:90, 80:90] = roi 

The following is the result of the previous operations:

Figure 1.1 – Some manipulation of images using NumPy slicing

Figure 1.1 – Some manipulation of images using NumPy slicing

To make a copy of an image, you can simply use the copy() method:

image2 = image.copy()

RGB images

RGB images differ from grayscale because they are three-dimensional, with the third index representing the three channels. Please note that OpenCV stores the images in BGR format, not RGB, so channel 0 is blue, channel 1 is green, and channel 2 is red.

Important note

OpenCV stores the images as BGR, not RGB. In the rest of the book, when talking about RGB images, it will only mean that it is a 24-bit color image, but the internal representation will usually be BGR.

To create an RGB image, we need to provide three sizes:

rgb = np.zeros([100, 100, 3],dtype=np.uint8)  

If you were going to run the same code previously used on the grayscale image with the new RGB image (skipping the third index), you would get the same result. This is because NumPy would apply the same color to all the three channels, which results in a shade of gray.

To select a color, it is enough to provide the third index:

rgb[:, :, 2] = 255       # Makes the image red

In NumPy, it is also possible to select rows, columns, or channels that are not contiguous. You can do this by simply providing a tuple with the required indexes. To make the image magenta, you need to set the blue and red channels to 255, which can be achieved with the following code:

rgb[:, :, (0, 2)] = 255  # Makes the image magenta

You can convert an RGB image into grayscale using cvtColor():

gray = cv2.cvtColor(original, cv2.COLOR_BGR2GRAY)

Working with image files

OpenCV provides a very simple way to load images, using imread():

import cv2
image = cv2.imread('test.jpg')

To show the image, you can use imshow(), which accepts two parameters:

  • The name to write on the caption of the window that will show the image
  • The image to be shown

Unfortunately, its behavior is counterintuitive, as it will not show an image unless it is followed by a call to waitKey():

cv2.imshow("Image", image)cv2.waitKey(0)

The call to waitKey() after imshow() will have two effects:

  • It will actually allow OpenCV to show the image provided to imshow().
  • It will wait for the specified amount of milliseconds, or until a key is pressed if the amount of milliseconds passed is <=0. It will wait indefinitely.

An image can be saved on disk using the imwrite() method, which accepts three parameters:

  • The name of the file
  • The image
  • An optional format-dependent parameter:
cv2.imwrite("out.jpg", image)

Sometimes, it can be very useful to combine multiple pictures by putting them next to each other. Some examples in this book will use this feature extensively to compare images.

OpenCV provides two methods for this purpose: hconcat() to concatenate the pictures horizontally and vconcat() to concatenate them vertically, both accepting as a parameter a list of images. Take the following example:

black = np.zeros([50, 50], dtype=np.uint8)white = np.full([50, 50], 255, dtype=np.uint8)cv2.imwrite("horizontal.jpg", cv2.hconcat([white, black]))cv2.imwrite("vertical.jpg", cv2.vconcat([white, black]))

Here's the result:

Figure 1.2 – Horizontal concatenation with hconcat() and vertical concatenation with vconcat()

Figure 1.2 – Horizontal concatenation with hconcat() and vertical concatenation with vconcat()

We could use these two methods to create a chequerboard pattern:

row1 = cv2.hconcat([white, black])row2 = cv2.hconcat([black, white])cv2.imwrite("chess.jpg", cv2.vconcat([row1, row2]))

You will see the following chequerboard:

Figure 1.3 – A chequerboard pattern created using hconcat() in combination with vconcat()

Figure 1.3 – A chequerboard pattern created using hconcat() in combination with vconcat()

After having worked with images, it's time we work with videos.

Working with video files

Using videos in OpenCV is very simple; in fact, every frame is an image and can be manipulated with the methods that we have already analyzed.

To open a video in OpenCV, you need to call the VideoCapture() method:

cap = cv2.VideoCapture("video.mp4")

After that, you can call read(), typically in a loop, to retrieve a single frame. The method returns a tuple with two values:

  • A Boolean value that is false when the video is finished
  • The next frame:
ret, frame = cap.read()

To save a video, there is the VideoWriter object; its constructor accepts four parameters:

  • The filename
  • A FOURCC (four-character code) of the video code
  • The number of frames per second
  • The resolution

Take the following example:

mp4 = cv2.VideoWriter_fourcc(*'MP4V')writer = cv2.VideoWriter('video-out.mp4', mp4, 15, (640, 480))

Once VideoWriter has been created, the write() method can be used to add a frame to the video file:

writer.write(image)

When you have finished using the VideoCapture and VideoWriter objects, you should call their release method:

cap.release()
writer.release()

Working with webcams

Webcams are handled similarly to a video in OpenCV; you just need to provide a different parameter to VideoCapture, which is the 0-based index identifying the webcam:

cap = cv2.VideoCapture(0)

The previous code opens the first webcam; if you need to use a different one, you can specify a different index.

Now, let's try manipulating some images.

Manipulating images

As part of a computer vision pipeline for a self-driving car, with or without deep learning, you might need to process the video stream to make other algorithms work better as part of a preprocessing step.

This section will provide you with a solid foundation to preprocess any video stream.

Flipping an image

OpenCV provides the flip() method to flip an image, and it accepts two parameters:

  • The image
  • A number that can be 1 (horizontal flip), 0 (vertical flip), or -1 (both horizontal and vertical flip)

Let's see a sample code:

flipH = cv2.flip(img, 1)flipV = cv2.flip(img, 0)flip = cv2.flip(img, -1)

This will produce the following result:

Figure 1.4 – Original image, horizontally flipped, vertically flipped, and both

Figure 1.4 – Original image, horizontally flipped, vertically flipped, and both

As you can see, the first image is our original image, which was flipped horizontally and vertically, and then both, horizontally and vertically together.

Blurring an image

Sometimes, an image can be too noisy, possibly because of some processing steps that you have done. OpenCV provides several methods to blur an image, which can help in these situations. Most likely, you will have to take into consideration not only the quality of the blur but also the speed of execution.

The simplest method is blur(), which applies a low-pass filter to the image and requires at least two parameters:

  • The image
  • The kernel size (a bigger kernel means more blur):
blurred = cv2.blur(image, (15, 15))

Another option is to use GaussianBlur(), which offers more control and requires at least three parameters:

  • The image
  • The kernel size
  • sigmaX, which is the standard deviation on X

It is recommended to specify both sigmaX and sigmaY (standard deviation on Y, the forth parameter):

gaussian = cv2.GaussianBlur(image, (15, 15), sigmaX=15, sigmaY=15)

An interesting blurring method is medianBlur(), which computes the median and therefore has the characteristic of emitting only pixels with colors present in the image (which does not necessarily happen with the previous method). It is effective at reducing "salt and pepper" noise and has two mandatory parameters:

  • The image
  • The kernel size (an odd integer greater than 1):
median = cv2.medianBlur(image, 15)

There is also a more complex filter, bilateralFilter(), which is effective at removing noise while keeping the edge sharp. It is the slowest of the filters, and it requires at least four parameters:

  • The image
  • The diameter of each pixel neighborhood
  • sigmaColor: Filters sigma in the color space, affecting how much the different colors are mixed together, inside the pixel neighborhood
  • sigmaSpace: Filters sigma in the coordinate space, affecting how distant pixels affect each other, if their colors are closer than sigmaColor:
bilateral = cv2.bilateralFilter(image, 15, 50, 50)

Choosing the best filter will probably require some experiments. You might also need to consider the speed. To give you some ballpark estimations based on my tests, and considering that the performance is dependent on the parameters supplied, note the following:

  • blur() is the fastest.
  • GaussianBlur() is similar, but it can be 2x slower than blur().
  • medianBlur() can easily be 20x slower than blur().
  • BilateralFilter() is the slowest and can be 45x slower than blur().

Here are the resultant images:

Figure 1.5 – Original, blur(), GaussianBlur(), medianBlur(), and BilateralFilter(), with the parameters used in the code samples

Figure 1.5 – Original, blur(), GaussianBlur(), medianBlur(), and BilateralFilter(), with the parameters used in the code samples

Changing contrast, brightness, and gamma

A very useful function is convertScaleAbs(), which executes several operations on all the values of the array:

  • It multiplies them by the scaling parameter, alpha.
  • It adds to them the delta parameter, beta.
  • If the result is above 255, it is set to 255.
  • The result is converted into an unsigned 8-bit int.

The function accepts four parameters:

  • The source image
  • The destination (optional)
  • The alpha parameter used for the scaling
  • The beta delta parameter

convertScaleAbs() can be used to affect the contrast, as an alpha scaling factor above 1 increases the contrast (amplifying the color difference between pixels), while a scaling factor below one reduces it (decreasing the color difference between pixels):

cv2.convertScaleAbs(image, more_contrast, 2, 0)cv2.convertScaleAbs(image, less_contrast, 0.5, 0)

It can also be used to affect the brightness, as the beta delta factor can be used to increase the value of all the pixels (increasing the brightness) or to reduce them (decreasing the brightness):

cv2.convertScaleAbs(image, more_brightness, 1, 64)
cv2.convertScaleAbs(image, less_brightness, 1, -64)

Let's see the resulting images:

Figure 1.6 – Original, more contrast (2x), less contrast (0.5x), more brightness (+64), and less brightness (-64)

Figure 1.6 – Original, more contrast (2x), less contrast (0.5x), more brightness (+64), and less brightness (-64)

A more sophisticated method to change the brightness is to apply gamma correction. This can be done with a simple calculation using NumPy. A gamma value above 1 will increase the brightness, and a gamma value below 1 will reduce it:

Gamma = 1.5
g_1_5 = np.array(255 * (image / 255) ** (1 / Gamma), dtype='uint8')
Gamma = 0.7
g_0_7 = np.array(255 * (image / 255) ** (1 / Gamma), dtype='uint8')

The following images will be produced:

Figure 1.7 – Original, higher gamma (1.5), and lower gamma (0.7)

Figure 1.7 – Original, higher gamma (1.5), and lower gamma (0.7)

You can see the effect of different gamma values in the middle and right images.

Drawing rectangles and text

When working on object detection tasks, it is a common need to highlight an area to see what has been detected. OpenCV provides the rectangle() function, accepting at least the following parameters:

  • The image
  • The upper-left corner of the rectangle
  • The lower-right corner of the rectangle
  • The color to use
  • (Optional) The thickness:
cv2.rectangle(image, (x, y), (x + w, y + h), (255, 255, 255), 2)

To write some text in the image, you can use the putText() method, accepting at least six parameters:

  • The image
  • The text to print
  • The coordinates of the bottom-left corner
  • The font face
  • The scale factor, to change the size
  • The color:
cv2.putText(image, 'Text', (x, y), cv2.FONT_HERSHEY_PLAIN, 2, clr)

Pedestrian detection using HOG

The Histogram of Oriented Gradients (HOG) is an object detection technique implemented by OpenCV. In simple cases, it can be used to see whether there is a certain object present in the image, where it is, and how big it is.

OpenCV includes a detector trained for pedestrians, and you are going to use it. It might not be enough for a real-life situation, but it is useful to learn how to use it. You could also train another one with more images to see whether it performs better. Later in the book, you will see how to use deep learning to detect not only pedestrians but also cars and traffic lights.

Sliding window

The HOG pedestrian detector in OpenCV is trained with a model that is 48x96 pixels, and therefore it is not able to detect objects smaller than that (or, better, it could, but the box will be 48x96).

At the core of the HOG detector, there is a mechanism able to tell whether a given 48x96 image is a pedestrian. As this is not terribly useful, OpenCV implements a sliding window mechanism, where the detector is applied many times, on slightly different positions; the "image window" under consideration slides a bit. Once it has analyzed the whole image, the image window is increased in size (scaled) and the detector is applied again, to be able to detect bigger objects. Therefore, the detector is applied hundreds or even thousands of times for each image, which can be slow.

Using HOG with OpenCV

First, you need to initialize the detector and specify that you want to use the detector for pedestrians:

hog = cv2.HOGDescriptor()det = cv2.HOGDescriptor_getDefaultPeopleDetector()
hog.setSVMDetector(det)

Then, it is just a matter of calling detectMultiScale():

(boxes, weights) = hog.detectMultiScale(image, winStride=(1, 1), padding=(0, 0), scale=1.05)

The parameters that we used require some explanation, and they are as follows:

  • The image
  • winStride, the window stride, which specifies how much the sliding window moves every time
  • Padding, which can add some padding pixels at the border of the image (useful to detect pedestrians close to the border)
  • Scale, which specifies how much to increase the window image every time

You should consider that decreasing winSize can improve the accuracy (as more positions are considered), but it has a big impact on performance. For example, a stride of (4, 4) can be up to 16 times faster than a stride of (1, 1), though in practice, the performance difference is a bit less, maybe 10 times.

In general, decreasing the scale also improves the precision and decreases the performance, though the impact is not dramatic.

Improving the precision means detecting more pedestrians, but this can also increase the false positives. detectMultiScale() has a couple of advanced parameters that can be used for this:

  • hitThreshold, which changes the distance required from the Support Vector Machine (SVM) plane. A higher threshold means than the detector is more confident with the result.
  • finalThreshold, which is related to the number of detections in the same area.

Tuning these parameters requires some experiments, but in general, a higher hitThreshold value (typically in the range 0–1.0) should reduce the false positives.

A higher finalThreshold value (such as 10) will also reduce the false positives.

We will use detectMultiScale() on an image with pedestrians generated by Carla:

Figure 1.8 – HOG detection, winStride=(1, 2), scale=1.05, padding=(0, 0) Left: hitThreshold = 0, finalThreshold = 1; Center: hitThreshold = 0, inalThreshold = 3; Right: hitThreshold = 0.2, finalThreshold = 1

Figure 1.8 – HOG detection, winStride=(1, 2), scale=1.05, padding=(0, 0)Left: hitThreshold = 0, finalThreshold = 1; Center: hitThreshold = 0, finalThreshold = 3;Right: hitThreshold = 0.2, finalThreshold = 1

As you can see, we have pedestrians being detected in the image. Using a low hit threshold and a low final threshold can result in false positives, as in the left image. Your goal is to find the right balance, detecting the pedestrians but without having too many false positives.

Introduction to the camera

The camera is probably one of the most ubiquitous sensors in our modern world. They are used in everyday life in our mobile phones, laptops, surveillance systems, and of course, photography. They provide rich, high-resolution imagery containing extensive information about the environment, including spatial, color, and temporal information.

It is no surprise that they are heavily used in self-driving technologies. One reason why the camera is so popular is that it mirrors the functionality of the human eye. For this reason, we are very comfortable using them as we connect on a deep level with their functionality, limitations, and strengths.

In this section, you will learn about the following:

  • Camera terminology
  • The components of a camera
  • Strengths and weaknesses
  • Choosing the right camera for self-driving

Let's discuss each in detail.

Camera terminology

Before you learn about the components of a camera and its strengths and weaknesses, you need to know some basic terminology. These terms will be important when evaluating and ultimately choosing your camera for your self-driving application.

Field of View (FoV)

This is the vertical and horizontal angular portion of the environment (scene) that is visible to the sensor. In self-driving cars, you typically want to balance the FoV with the resolution of the sensor to ensure we see as much of the environment as possible with the least number of cameras. There is a trade space related to FoV. Larger FoV usually means more lens distortion, which you will need to compensate for in your camera calibration (see the section on camera calibration):

Figure 1.9 – Field of View, credit: https://www.researchgate.net/figure/Illustration-of-camera-lenss-field-of-view-FOV_fig4_335011596

Figure 1.9 – Field of View, credit: https://www.researchgate.net/figure/Illustration-of-camera-lenss-field-of-view-FOV_fig4_335011596

Resolution

This is the total number of pixels in the horizontal and vertical directions on the sensor. This parameter is often discussed using the term megapixels (MP). For example, a 5 MP camera, such as the FLIR Blackfly, has a sensor with 2448 × 2048 pixels, which equates to 5,013,504 pixels.

Higher resolutions allow you to use a lens with a wider FoV but still provide the detail needed for running your computer vision algorithms. This means you can use fewer cameras to cover the environment and thereby lower the cost.

The Blackfly, in all its different flavors, is a common camera used in self-driving vehicles thanks to its cost, small form, reliability, robustness, and ease of integration:

Figure 1.10 – Pixel resolution

Figure 1.10 – Pixel resolution

Focal length

This is the length from the lens optical center to the sensor. The focal length is best thought of as the zoom of the camera. A longer focal length means you will be zoomed in closer to objects in the environment. In your self-driving car, you may choose different focal lengths based on what you need to see in the environment. For example, you might choose a relatively long focal length of 100 mm to ensure enough resolution for your classifier algorithm to detect a traffic signal at a distance far enough to allow the car to react with smooth and safe stopping:

Figure 1.11 – Focal length, credit: https://photographylife.com/what-is-focal-length-in-photography

Figure 1.11 – Focal length, credit: https://photographylife.com/what-is-focal-length-in-photography

Aperture and f-stop

This is the opening through which light passes to shine on the sensor. The unit that is commonly used to describe the size of the opening is the f-stop, which refers to the ratio of the focal length over the aperture size. For example, a lens with a 50 mm focal length and an aperture diameter of 35 mm will equate to an f-stop of f/1.4. The following figure illustrates different aperture diameters and their f-stop values on a 50 mm focal length lens. Aperture size is very important in your self-driving car as it is directly correlated with the Depth of Field (DoF). Large apertures also allow the camera to be tolerant of obscurants (for example, bugs) that may be on the lens. Larger apertures allow light to pass around the bug and still make it to the sensor:

Figure 1.12 – Aperture, credit: https://en.wikipedia.org/wiki/Aperture#/media/File:Lenses_with_different_apertures.jpg

Figure 1.12 – Aperture, credit: https://en.wikipedia.org/wiki/Aperture#/media/File:Lenses_with_different_apertures.jpg

Depth of field (DoF)

This is the distance range in the environment that will be in focus. This is directly correlated to the size of the aperture. Generally, in self-driving cars, you will want a deep DoF so that everything in the FoV is in focus for your computer vision algorithms. The problem is that deep DoF is achieved with a small aperture, which means less light impacting the sensor. So, you will need to balance DoF with dynamic range and ISO to ensure you see everything you need to in your environment.

The following figure depicts the relationship between DoF and aperture:

Figure 1.13 – DoF versus aperture, credit: https://thumbs.dreamstime.com/z/aperture-infographic-explaining-depth-field-corresponding-values-their-effect-blur-light-75823732.jpg

Figure 1.13 – DoF versus aperture, credit: https://thumbs.dreamstime.com/z/aperture-infographic-explaining-depth-field-corresponding-values-their-effect-blur-light-75823732.jpg

Dynamic range

This is a property of the sensor that indicates its contrast ratio or the ratio of the brightest over the darkest subjects that it can resolve. This may be referred to using the unit dB (for example, 78 dB) or contrast ratio (for example, 2,000,000/1).

Self-driving cars need to operate both during the day and at night. This means that the sensor needs to be sensitive enough to provide useful detail in dark conditions while not oversaturating when driving in bright sunlight. Another reason for High Dynamic Range (HDR) is the example of driving when the sun is low on the horizon. I am sure you have experienced this while driving yourself to work in the morning and the sun is right in your face and you can barely see the environment in front of you because it is saturating your eyes. HDR means that the sensor will be able to see the environment even in the face of direct sunlight. The following figure illustrates these conditions:

Figure 1.14 – Example HDR, credit: https://petapixel.com/2011/05/02/use-iso-numbers-that-are-multiples-of-160-when-shooting-dslr-video/

Figure 1.14 – Example HDR, credit: https://petapixel.com/2011/05/02/use-iso-numbers-that-are-multiples-of-160-when-shooting-dslr-video/

Your dream dynamic range

If you could make a wish and have whatever dynamic range you wanted in your sensor, what would it be?

International Organization for Standardization (ISO) sensitivity

This is the sensitivity of the pixels to incoming photons.

Wait a minute, you say, do you have your acronym mixed up? It looks like it, but the International Organization for Standardization decided to standardize even their acronym since it would be different in every language otherwise. Thanks, ISO!

The standardized ISO values can range from 100 to upward of 10,000. Lower ISO values correspond to a lower sensitivity of the sensor. Now you may ask, "why wouldn't I want the highest sensitivity?" Well, sensitivity comes at a cost...NOISE. The higher the ISO, the more noise you will see in your images. This added noise may cause trouble for your computer vision algorithms when trying to classify objects. In the following figure, you can see the effect of higher ISO values on noise in an image. These images are all taken with the lens cap on (fully dark). As you increase the ISO value, random noise starts to creep in:

Figure 1.15 – Example ISO values and noise in a dark room

Figure 1.15 – Example ISO values and noise in a dark room

Frame rate (FPS)

This is the rate at which the sensor can obtain consecutive images, usually expressed in Hz or Frames Per Second (FPS). Generally speaking, you want to have the fastest frame rate so that fast-moving objects are not blurry in your scene. The main trade-off here is latency: the time from a real event happening until your computer vision algorithm detects it. The higher the frame rate that must be processed, the higher the latency. In the following figure, you can see the effect of frame rate on motion blur.

Blur is not the only reason for choosing a higher frame rate. Depending on the speed of your vehicle, you will need a frame rate that will allow the vehicle to react if an object suddenly appears in its FoV. If your frame rate is too slow, by the time the vehicle sees something, it may be too late to react:

Figure 1.16 – 120 Hz versus 60 Hz frame rate, credit: https://gadgetstouse.com/blog/2020/03/18/difference-between-60hz-90hz-120hz-displays/

Figure 1.16 – 120 Hz versus 60 Hz frame rate, credit: https://gadgetstouse.com/blog/2020/03/18/difference-between-60hz-90hz-120hz-displays/

Lens flare

These are the artifacts of light from an object that impact pixels on the sensor that do not correlate with the position of the object in the environment. You have likely experienced this driving at night when you see oncoming headlights. That starry effect is due to light scattered in the lens of your eye (or camera), due to imperfections, leading some of the photons to impact "pixels" that do not correlate with where the photons came from – that is, the headlights. The following figure shows what that effect looks like. You can see that the starburst makes it very difficult to see the actual object, the car!

Figure 1.17 – Lens flare from oncoming headlights, credit: https://s.blogcdn.com/cars.aol.co.uk/media/2011/02/headlights-450-a-g.jpg

Figure 1.17 – Lens flare from oncoming headlights, credit: https://s.blogcdn.com/cars.aol.co.uk/media/2011/02/headlights-450-a-g.jpg

Lens distortion

This is the difference between the rectilinear or real scene to what your camera image sees. If you have ever seen action camera footage, you probably recognized the "fish-eye" lens effect. The following figure shows an extreme example of the distortion from a wide-angle lens. You will learn to correct this distortion with OpenCV:

Figure 1.18 – Lens distortion, credit: https://www.slacker.xyz/post/what-lens-should-i-get

Figure 1.18 – Lens distortion, credit: https://www.slacker.xyz/post/what-lens-should-i-get

The components of a camera

Like the eye, a camera is made up of a light-sensitive array, an aperture, and a lens.

Light sensitive array – CMOS sensor (the camera's retina)

The light-sensitive array, in most consumer cameras, is called a CMOS active-pixel sensor (or just a sensor). Its basic function is to convert incident photons into an electrical current that can be digitized based on the color wavelength of the photon.

The aperture (the camera's iris)

The aperture or iris of a camera is the opening through which light can pass on its way to the sensor. This can be variable or fixed depending on the type of camera you are using. The aperture is used to control parameters such as depth of field and the amount of light hitting the sensor.

The lens (the camera's lens)

The lens or optics are the components of the camera that focus the light from the environment onto the sensor. The lens primarily determines the FoV of the camera through its focal length. In self-driving applications, the FoV is very important since it determines how much of the environment the car can see with a single camera. The optics of a camera are often some of the most expensive parts and have a large impact on image quality and lens flare.

Considerations for choosing a camera

Now that you have learned all the basics of what a camera is and the relevant terminology, it is time to learn how to choose a camera for your self-driving application. The following is a list of the primary factors that you will need to balance when choosing a camera:

  • Resolution
  • FoV
  • Dynamic range
  • Cost
  • Size
  • Ingress protection (IP rating)

    The perfect camera

    If you could design the ideal camera, what would it be?

My perfect self-driving camera would be able to see in all directions (spherical FoV, 360º HFoV x 360º VFoV). It would have infinite resolution and dynamic range, so you could digitally resolve objects at any distance in any lighting condition. It would be the size of a grain of rice, completely water- and dustproof, and would cost $5! Obviously, this is not possible. So, we must make some careful trade-offs for what we need.

The best place to start is with your budget for cameras. This will give you an idea of what models and specifications to look for.

Next, consider what you need to see for your application:

  • Do you need to be able to see a child from 200 m away while traveling at 100 km/h?
  • What coverage around the vehicle do you need, and can you tolerate any blind spots on the side of the vehicle?
  • Do you need to see at night and during the day?

Lastly, consider how much room you have to integrate these cameras. You probably don't want your vehicle to look like this:

Figure 1.19 – Camera art, credit: https://www.flickr.com/photos/laughingsquid/1645856255/

Figure 1.19 – Camera art, credit: https://www.flickr.com/photos/laughingsquid/1645856255/

This may be very overwhelming, but it is important when thinking about how to design your computer vision system. A good camera to start with that is very popular is the FLIR Blackfly S series. They strike an excellent balance of resolution, FPS, and cost. Next, pair it with a lens that meets your FoV needs. There are some helpful FoV calculators available on the internet, such as the one from http://www.bobatkins.com/photography/technical/field_of_view.html.

Strengths and weaknesses of cameras

Now, no sensor is perfect, and even your beloved camera will have its pros and cons. Let's go over some of them now.

Let's look at the strengths first:

  • High-resolution: Relative to other sensor types, such as radar, lidar, and sonar, cameras have an excellent resolution for picking out objects in your scene. You can easily find cameras with 5 MP resolution quite cheaply.
  • Texture, color, and contrast information: Cameras provide very rich information about the environment that other sensor types just can't. This is because of a variety of wavelengths that cameras sense.
  • Cost: Cameras are one of the cheapest sensors you can find, especially for the quality of data they provide.
  • Size: CMOS technology and modern ASICs have made cameras incredibly small, many less than 30 mm cubed.
  • Range: This is really thanks to the high resolution and passive nature of the sensor.

Next, here are the weaknesses:

  • A large amount of data to process for object detection: With high resolution comes a lot of data. Such is the price we pay for such accurate and detailed imagery.
  • Passive: A camera requires an external illumination source, such as the sun, headlights, and so on.
  • Obscurants (such as bugs, raindrops, heavy fog, dust, or snow): A camera is not particularly good at seeing through heavy rain, fog, dust, or snow. Radars are typically better suited for this.
  • Lack native depth/velocity information: A camera image alone doesn't give you any information on an object's speed or distance.

    Photogrammetry is helping to bolster this weakness but costs valuable processing resources (GPU, CPU, latency, and so on.) It is also less accurate than a radar or lidar sensor, which produce this information natively.

Now that you have a good understanding of how a camera works, as well as its basic parts and terminology, it's time to get your hands dirty and start calibrating a camera with OpenCV.

Camera calibration with OpenCV

In this section, you will learn how to take objects with a known pattern and use them to correct lens distortion using OpenCV.

Remember the lens distortion we talked about in the previous section? You need to correct this to ensure you accurately locate where objects are relative to your vehicle. It does you no good to see an object if you don't know whether it is in front of you or next to you. Even good lenses can distort the image, and this is particularly true for wide-angle lenses. Luckily, OpenCV provides a mechanism to detect this distortion and correct it!

The idea is to take pictures of a chessboard, so OpenCV can use this high-contrast pattern to detect the position of the points and compute the distortion based on the difference between the expected image and the recorded one.

You need to provide several pictures at different orientations. It might take some experiments to find a good set of pictures, but 10 to 20 images should be enough. If you use a printed chessboard, take care to have the paper as flat as possible so as to not compromise the measurements:

Figure 1.20 – Some examples of pictures that can be used for calibration

Figure 1.20 – Some examples of pictures that can be used for calibration

As you can see, the central image clearly shows some barrel distortion.

Distortion detection

OpenCV tries to map a series of three-dimensional points to the two-dimensional coordinates of the camera. OpenCV will then use this information to correct the distortion.

The first thing to do is to initialize some structures:

image_points = []   # 2D points object_points = []  # 3D points coords = np.zeros((1, nX * nY, 3), np.float32)coords[0,:,:2] = np.mgrid[0:nY, 0:nX].T.reshape(-1, 2)

Please note nX and nY, which are the number of points to find in the chessboard on the x and y axes,, respectively. In practice, this is the number of squares minus 1.

Then, we need to call findChessboardCorners():

found, corners = cv2.findChessboardCorners(image, (nY, nX), None)

found is true if OpenCV found the points, and corners will contain the points found.

In our code, we will assume that the image has been converted into grayscale, but you can calibrate using an RGB picture as well.

OpenCV provides a nice image depicting the corners found, ensuring that the algorithm is working properly:

out = cv2.drawChessboardCorners(image, (nY, nX), corners, True)object_points.append(coords)   # Save 3d points image_points.append(corners)   # Save corresponding 2d points

Let's see the resulting image:

Figure 1.21 – Corners of the calibration image found by OpenCV

Figure 1.21 – Corners of the calibration image found by OpenCV

Calibration

After finding the corners in several images, we are finally ready to generate the calibration data using calibrateCamera():

ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(object_points, image_points, shape[::-1], None, None)

Now, we are ready to correct our images, using undistort():

dst = cv2.undistort(image, mtx, dist, None, mtx)

Let's see the result:

Figure 1.22 – Original image and calibrated image

Figure 1.22 – Original image and calibrated image

We can see that the second image has less barrel distortion, but it is not great. We probably need more and better calibration samples.

But we can also try to get more precision from the same calibration images by looking for sub-pixel precision when looking for the corners. This can be done by calling cornerSubPix() after findChessboardCorners():

corners = cv2.cornerSubPix(image, corners, (11, 11), (-1, -1), (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001))

The following is the resulting image:

Figure 1.23 – Image calibrated with sub-pixel precision

Figure 1.23 – Image calibrated with sub-pixel precision

As the complete code is a bit long, I recommend checking out the full source code on GitHub.

Summary

Well, you have had a great start to your computer vision journey toward making a real self-driving car.

You learned about a very useful toolset called OpenCV with bindings for Python and NumPy. With these tools, you are now able to create and import images using methods such as imread(), imshow(), hconcat(), and vconcat(). You learned how to import and create video files, as well as capturing video from a webcam with methods such as VideoCapture() and VideoWriter(). Watch out Spielberg, there is a new movie-maker in town!

It was wonderful to be able to import images, but how do you start manipulating them to help your computer vision algorithms learn what features matter? You learned how to do this through methods such as flip(), blur(), GaussianBlur(), medianBlur(), bilateralFilter(), and convertScaleAbs(). Then, you learned how to annotate images for human consumption with methods such as rectangle() and putText().

Then came the real magic, where you learned how to take the images and do your first piece of real computer vision using HOG to detect pedestrians. You learned how to apply a sliding window to scan the detector over an image in various sized windows using the detectMultiScale() method, with parameters such as winStride, padding, scale, hitThreshold, and finalThreshold.

You had a lot of fun with all the new tools you learned for working with images. But there was something missing. How do I get these images on my self-driving car? To answer this, you learned about the camera and its basic terminology, such as resolution, FoV, focal length, aperture, DoF, dynamic range, ISO, frame rate, lens flare, and finally, lens distortion. Then, you learned the basic components that comprise a camera, namely the lens, aperture, and light-sensitive arrays. With these basics, you moved on to some considerations for choosing a camera for your application by learning about the strengths and weaknesses of a camera.

Armed with this knowledge, you boldly began to remove one of these weaknesses, lens distortion, with the tools you learned in OpenCV for distortion correction. You used methods such as findChessboardCorners(), calibrateCamera(), undistort(), and cornerSubPix() for this.

Wow, you are really on your way to being able to perceive the world in your self-driving application. You should take a moment and be proud of what you have learned so far. Maybe you can celebrate with a selfie and apply some of what you learned!

In the next chapter, you are going to learn about some of the basic signal types and protocols you are likely to encounter when trying to integrate sensors in your self-driving application.

Questions

  1. Can OpenCV take advantage of hardware acceleration?
  2. What's the best blurring method if CPU power is not a problem?
  3. Which detector can be used to find pedestrians in an image?
  4. How can you read the video stream from a webcam?
  5. What is the trade-off between aperture and depth of field?
  6. When do you need a high ISO?
  7. Is it worth computing sub-pixel precision for camera calibration?
Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Explore the building blocks of the visual perception system in self-driving cars
  • Identify objects and lanes to define the boundary of driving surfaces using open-source tools like OpenCV and Python
  • Improve the object detection and classification capabilities of systems with the help of neural networks

Description

The visual perception capabilities of a self-driving car are powered by computer vision. The work relating to self-driving cars can be broadly classified into three components - robotics, computer vision, and machine learning. This book provides existing computer vision engineers and developers with the unique opportunity to be associated with this booming field. You will learn about computer vision, deep learning, and depth perception applied to driverless cars. The book provides a structured and thorough introduction, as making a real self-driving car is a huge cross-functional effort. As you progress, you will cover relevant cases with working code, before going on to understand how to use OpenCV, TensorFlow and Keras to analyze video streaming from car cameras. Later, you will learn how to interpret and make the most of lidars (light detection and ranging) to identify obstacles and localize your position. You’ll even be able to tackle core challenges in self-driving cars such as finding lanes, detecting pedestrian and crossing lights, performing semantic segmentation, and writing a PID controller. By the end of this book, you’ll be equipped with the skills you need to write code for a self-driving car running in a driverless car simulator, and be able to tackle various challenges faced by autonomous car engineers.

Who is this book for?

This book is for software engineers who are interested in learning about technologies that drive the autonomous car revolution. Although basic knowledge of computer vision and Python programming is required, prior knowledge of advanced deep learning and how to use sensors (lidar) is not needed.

What you will learn

  • Understand how to perform camera calibration
  • Become well-versed with how lane detection works in self-driving cars using OpenCV
  • Explore behavioral cloning by self-driving in a video-game simulator
  • Get to grips with using lidars
  • Discover how to configure the controls for autonomous vehicles
  • Use object detection and semantic segmentation to locate lanes, cars, and pedestrians
  • Write a PID controller to control a self-driving car running in a simulator

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Oct 23, 2020
Length: 374 pages
Edition : 1st
Language : English
ISBN-13 : 9781800201934
Category :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning

Product Details

Publication date : Oct 23, 2020
Length: 374 pages
Edition : 1st
Language : English
ISBN-13 : 9781800201934
Category :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 129.97
Applied Deep Learning and Computer Vision for Self-Driving Cars
€37.99
Modern Computer Vision with PyTorch
€49.99
Hands-On Vision and Behavior for Self-Driving Cars
€41.99
Total 129.97 Stars icon

Table of Contents

16 Chapters
Section 1: OpenCV and Sensors and Signals Chevron down icon Chevron up icon
Chapter 1: OpenCV Basics and Camera Calibration Chevron down icon Chevron up icon
Chapter 2: Understanding and Working with Signals Chevron down icon Chevron up icon
Chapter 3: Lane Detection Chevron down icon Chevron up icon
Section 2: Improving How the Self-Driving Car Works with Deep Learning and Neural Networks Chevron down icon Chevron up icon
Chapter 4: Deep Learning with Neural Networks Chevron down icon Chevron up icon
Chapter 5: Deep Learning Workflow Chevron down icon Chevron up icon
Chapter 6: Improving Your Neural Network Chevron down icon Chevron up icon
Chapter 7: Detecting Pedestrians and Traffic Lights Chevron down icon Chevron up icon
Chapter 8: Behavioral Cloning Chevron down icon Chevron up icon
Chapter 9: Semantic Segmentation Chevron down icon Chevron up icon
Section 3: Mapping and Controls Chevron down icon Chevron up icon
Chapter 10: Steering, Throttle, and Brake Control Chevron down icon Chevron up icon
Chapter 11: Mapping Our Environments Chevron down icon Chevron up icon
Assessments Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.3
(3 Ratings)
5 star 0%
4 star 66.7%
3 star 0%
2 star 33.3%
1 star 0%
Shane Lee Jan 09, 2021
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
What I like:1. This book covers very well and detail use of open source libraries and tools, on OpenCV and NumPy, especially on the coding and development environment setup.2. This books uses pictures to illustrate a topic whenever possible, this is very helpful in the domain of perception and camera based image processing.What I don't like:1. It is short of mathematical fundamentals, in my view, every domain, and chapter covers the goal of solving self-driving technology, should have a corresponding mathematical equations, or models to begin with, before jumping to the use of open source library, or coding. This will really help readers embrace the understanding of the technologies.2. On the mapping and SLAM part, it lacks some of the real scenarios, and problems that needs to be solved in order to benefit the self-driving.What I would like to see:1. It would be great to extend the full details of using open source tools, to cover a balance of 3 pillars - 1) Real-time object detection and perception, 2) real-time absolute positioning with GNSS, and 3)HD mapping.Even though today no one is really covering well on these 3 pillars strongly and tightly enough to render a robust self-driving engine yet, but it is essential to have before we reach the all-weather, all-time, all scenarios self-driving world.
Amazon Verified review Amazon
Amazon Customer Jan 11, 2021
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
What I like: This book is a introductory manual into self driving cars. It basically explains what is what like the module in vision which has the basic pipe line for predicting lanes. If you are working on a similar problem it’s better to have this book handy so that you can figure out what to do next. All the other modules are helpful as well like the one which explains signals can how to read them. This particular section I found is very useful as a controls engineer myself I find myself working with different type of communication interfaces like CAN, SPI etcThis book can also be used as a reference for self driving car course by udacity.What Ibwould like to see: more in depth mathematical proofs and applications for different concepts
Amazon Verified review Amazon
John Smith Aug 25, 2022
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
Consider this line of code from chapter 1 about camera calibration:corners = cv2.cornerSubPix(img_src, corners, (11, 11), (-1, -1), (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001))what is (11,11) or what is (-1,-1). There is absolutely no explanation and you have to consult OpenCV documentation all the time.When I buy a book like this, I expect the author to provide an explanation for all those arguments so that I can save time.It seems OpenCV documentation is more complete and easier to follow.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.