You're reading from Learn OpenCV 4 by Building Projects Build real-world computer vision and image processing applications with OpenCV and C++

Product type Paperback

Published in Nov 2018

Publisher Packt

ISBN-13 9781789341225

Length 310 pages

Edition 2nd Edition

Languages

C++

Tools

OpenCV

Concepts

Computer Vision

Authors (3):

David Millán Escrivá

Prateek Joshi

Vinícius G. Mendonça

View More author details

What can you do with OpenCV?

Using OpenCV, you can pretty much do every computer vision task you can think of. Real-life problems require you to use many computer vision algorithms and modules together to achieve the desired result. So, you just need to understand which OpenCV modules and functions to use, in order to get what you want.

Let's look at what OpenCV can do out of the box.

Inbuilt data structures and input/output

One of the best things about OpenCV is that it provides a lot of in-built primitives to handle operations related to image processing and computer vision. If you have to write something from scratch, you will have to define Image, Point, Rectangle, and so on. These are fundamental to almost any computer vision algorithm.

OpenCV comes with all these basic structures out of the box, contained in the core module. Another advantage is that these structures have already been optimized for speed and memory, and so you don't have to worry about the implementation details.

The imgcodecs module handles reading and writing of image files. When you operate on an input image and create an output image, you can save it as a .jpg or a .png file with a simple command.

You will be dealing with a lot of video files when you work with cameras. The videoio module handles everything related to the input and output of video files. You can easily capture a video from the webcam or read a video file in many different formats. You can even save a bunch of frames as a video file by setting properties such as frames per second, frame size, and so on.

Image processing operations

When you write a computer vision algorithm, there are a lot of basic image processing operations that you will use over and over again. Most of these functions are present in the imgproc module. You can do things such as image filtering, morphological operations, geometric transformations, color conversions, drawing on images, histograms, shape analysis, motion analysis, feature detection, and more.

Let's consider the following photo:

The right image is a rotated version of the one on the left. We can carry out this transformation with a single line in OpenCV.

There is another module, called ximgproc, which contains advanced image processing algorithms such as structured forests for edge detection, domain transform filter, adaptive manifold filter, and so on.

GUI

OpenCV provides a module called highgui that handles all the high-level user interface operations. Let's say you are working on a problem, and you want to check what the image looks like before you proceed to the next step. This module has functions that can be used to create windows to display images and/or videos.

There is a waiting function that will wait until you hit a key on your keyboard before it goes on to the next step. There is also a function that can detect mouse events. This is very useful in developing interactive applications.

Using this functionality, you can draw rectangles on those input windows, and then proceed based on the selected region. Consider the following screenshot:

As you can see, we drew a green rectangle on top of the window. Once we have the coordinates of that rectangle, we can operate only on that region.

Video analysis

Video analysis includes tasks such as analyzing the motion between successive frames in a video, tracking different objects in a video, creating models for video surveillance, and so on. OpenCV provides a module called video that can handle all of this.

There is also a module called videostab that deals with video stabilization. Video stabilization is important, as when you are capturing videos by holding the camera in your hands, there's usually a lot of shake that needs correcting. All modern devices use video stabilization to process the video before it's presented to the end user.

3D reconstruction

3D reconstruction is an important topic in computer vision. Given a set of 2D images, we can reconstruct the 3D scene using relevant algorithms. OpenCV provides algorithms that can find the relationship between various objects in those 2D images to compute their 3D positions in its calib3d module.

This module can also handle camera calibration, which is essential for estimating the parameters of the camera. These parameters define how the camera sees the scene in front of it. We need to know these parameters to design algorithms, or else we might get unexpected results.

Let's consider the following diagram:

As we can see here, the same object is captured from multiple positions. Our job is to reconstruct the original object using these 2D images.

Feature extraction

As we discussed earlier, the human visual system tends to extract the salient features from a given scene to remember it for retrieval later. To mimic this, people started designing various feature extractors that can extract these salient points from a given image. Popular algorithms include Scale Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), and Features From Accelerated Segment Test (FAST).

An OpenCV module called features2d provides functions to detect and extract all these features. Another module called xfeatures2d provides a few more feature extractors, some of which are still in the experimental phase. You can play around with these if you get the chance.

There is also a module called bioinspired that provides algorithms for biologically-inspired computer vision models.

Object detection

Object detection refers to detecting the location of an object in a given image. This process is not concerned with the type of object. If you design a chair detector, it will not tell you whether the chair in a given image is red with a high back, or blue with a low back—it will just tell you the location of the chair.

Detecting the location of objects is a critical step in many computer vision systems. Consider the following photo:

If you run a chair detector on this image, it will put a green box around all the chairs—but it won't tell you what kind of chair it is.

Object detection used to be a computationally-intensive task because of the number of calculations required to perform the detection at various scales. To solve this, Paul Viola and Michael Jones came up with a great algorithm in their seminal 2001 paper, which you can read at the following link: https://www.cs.cmu.edu/~efros/courses/LBMV07/Papers/viola-cvpr-01.pdf. They provided a fast way to design an object detector for any object.

OpenCV has modules called objdetect and xobjdetect that provide the framework to design an object detector. You can use it to develop detectors for random items such as sunglasses, boots, and so on.

Machine learning

Machine learning algorithms are used extensively to build computer vision systems for object recognition, image classification, face detection, visual search, and so on.

OpenCV provides a module called ml, which has many machine learning algorithms bundled into it, including a Bayes classifier, k-nearest neighbors (KNN), support vector machines (SVM), decision trees, neural networks, and more.

It also has a module called Fast Approximate Nearest Neighbor Search Library (FLANN), which contains algorithms for fast nearest neighbor searches in large datasets.

Computational photography

Computational photography refers to using advanced image processing techniques to improve the images captured by cameras. Instead of focusing on optical processes and image capture methods, computational photography uses software to manipulate visual data. Applications include high dynamic range imaging, panoramic images, image relighting, and light field cameras.

Let's look at the following image:

Look at those vivid colors! This is an example of a high dynamic range image, and it wouldn't be possible to get this using conventional image capture techniques. To do this, we have to capture the same scene at multiple exposures, register those images with each other, and then blend them nicely to create this image.

The photo and xphoto modules contain various algorithms that provide algorithms pertaining to computational photography. There is also a module called stitching that provides algorithms to create panoramic images.

The image shown can be found here: https://pixabay.com/en/hdr-high-dynamic-range-landscape-806260/.

Shape analysis

The notion of shape is crucial in computer vision. We analyze visual data by recognizing various different shapes in the image. This is actually an important step in many algorithms.

Let's say you are trying to identify a particular logo in an image. You know that it can appear in various shapes, orientations, and sizes. One good way to get started is to quantify the characteristics of the shape of the object.

The shape module provides all the algorithms required to extract different shapes, measure similarity between them, transform the shapes of objects, and more.

Optical flow algorithms

Optical flow algorithms are used in videos to track features across successive frames. Let's say you want to track a particular object in a video. Running a feature extractor on each frame would be computationally expensive; hence, the process would be slow. So, you just extract the features from the current frame, and then track those features in successive frames.

Optical flow algorithms are heavily used in video-based applications in computer vision. The optflow module contains all the algorithms required to perform optical flow. There is also a module called tracking that contains more algorithms that can be used to track features.

Face and object recognition

Face recognition refers to identifying the person in a given image. This is not the same as face detection, where you simply identify the location of a face in the given image.

If you want to build a practical biometric system that can recognize the person in front of the camera, you first need to run a face detector to identify the location of the face, and then run a separate face recognizer to identify who the person is. There is an OpenCV module called face that deals with face recognition.

As we discussed earlier, computer vision tries to model algorithms based on how humans perceive visual data. So, it would be helpful to find salient regions and objects in the images that can help with different applications such as object recognition, object detection and tracking, and so on. There is a module called saliency that's designed for this purpose. It provides algorithms that can detect salient regions in static images and videos.

Surface matching

We are increasingly interacting with devices that can capture the 3D structure of the objects around us. These devices essentially capture depth information, along with the regular 2D color images. So, it's important for us to build algorithms that can understand and process 3D objects.

Kinect is a good example of a device that captures depth information along with the visual data. The task at hand is to recognize the input 3D object, by matching it to one of the models in our database. If we have a system that can recognize and locate objects, then it can be used for many different applications.

There is a module called surface_matching that contains algorithms for 3D object recognition and a pose estimation algorithm using 3D features.

Text detection and recognition

Identifying text in a given scene and recognizing the content is becoming increasingly important. Applications include number plate recognition, recognizing road signs for self-driving cars, book scanning to digitize content, and more.

There is a module called text that contains various algorithms to handle text detection and recognition.

Deep learning

Deep learning has a big impact on computer vision and image recognition, and achieves a higher level of accuracy than other machine learning and artificially intelligent algorithms. Deep learning is not a new concept; it was introduced to the community around 1986, but it started a revolution around 2012 when new GPU hardware was optimized for parallel computing and Convolutional Neural Network (CNN) implementations and other techniques allowed the training of complex neural network architectures in reasonable times.

Deep learning can be applied to multiple use cases such as image recognition, object detection, voice recognition, and natural language processing. Since version 3.4, OpenCV has been implementing deep learning algorithms—in the latest version, multiple importers for important frameworks such as TensorFlow and Caffe have been added.