What can you do with OpenCV?
Using OpenCV, you can pretty much do every Computer Vision task that you can think of. Real-life problems require you to use many blocks together to achieve the desired result. So, you just need to understand what modules and functions to use to get what you want. Let's understand what OpenCV can do out of the box.
In-built data structures and input/output
One of the best things about OpenCV is that it provides a lot of in-built primitives to handle operations related to image processing and Computer Vision. If you have to write something from scratch, you will have to define things, such as an image, point, rectangle, and so on. These are fundamental to almost any Computer Vision algorithm. OpenCV comes with all these basic structures out of the box, and they are contained in the core
module. Another advantage is that these structures have already been optimized for speed and memory, so you don't have to worry about the implementation details.
The imgcodecs
module handles reading and writing image files. When you operate on an input image and create an output image, you can save it as a jpg
or a png
file with a simple command. You will be dealing with a lot of video files when you are working with cameras. The videoio
module handles everything related to the input/output of video files. You can easily capture a video from a webcam or read a video file in many different formats. You can even save a bunch of frames as a video file by setting properties such as frames per second, frame size, and so on.
Image processing operations
When you write a Computer Vision algorithm, there are a lot of basic image processing operations that you will use over and over again. Most of these functions are present in the imgproc
module. You can do things such as image filtering, morphological operations, geometric transformations, color conversions, drawing on images, histograms, shape analysis, motion analysis, feature detection, and so on. Let's consider the following figure:
The right-hand side image is a rotated version of the left-hand side image. We can do this transformation with a single line in OpenCV. There is another module called ximgproc
that contains advanced image processing algorithms such as structured forests for edge detection, domain transform filters, adaptive manifold filters, and so on.
OpenCV provides a module called highgui
that handles all the high-level user interface operations. Let's say that you are working on a problem and you want to check what the image looks like before you proceed to the next step. This module has functions that can be used to create windows to display images and/or video. There is also a waiting function that will wait until you hit a key on your keyboard before it goes to the next step. There is a function that can detect mouse events as well. This is very useful to develop interactive applications. Using this functionality, you can draw rectangles on these input windows and then proceed based on the selected region.
Consider the following image:
As you can see, we have drawn a green rectangle on the image and applied a negative film effect to that region. Once we have the coordinates of this rectangle, we can operate only on that region.
Video analysis includes tasks such as analyzing the motion between successive frames in a video, tracking different objects in a video, creating models for video surveillance, and so on. OpenCV provides a module called video
that can handle all of this. There is a module called videostab
that deals with video stabilization. Video stabilization is an important part of video cameras. When you capture videos by holding the camera in your hands, it's hard to keep your hands perfectly steady. If you look at that video as it is, it will look bad and jittery. All modern devices use video stabilization techniques to process the videos before they are presented to the end user.
3D reconstruction is an important topic in Computer Vision. Given a set of 2D images, we can reconstruct the 3D scene using the relevant algorithms. OpenCV provides algorithms that can find the relationship between various objects in these 2D images to compute their 3D positions. We have a module called calib3d
that can handle all this. This module can also handle camera calibration, which is essential to estimate the parameters of the camera. These parameters are basically the internal parameters of any given camera that uses them to transform the captured scene into an image. We need to know these parameters to design algorithms, or else we might get unexpected results. Let's consider the following figure:
As shown in the preceding image, the same object is captured from multiple poses. Our job is to reconstruct the original object using these 2D images.
As discussed earlier, the human visual system tends to extract the salient features from a given scene so that it can be retrieved later. To mimic this, people started designing various feature extractors that can extract these salient points from a given image. Some of the popular algorithms include SIFT (Scale Invariant Feature Transform), SURF (Speeded Up Robust Features), FAST (Features from Accelerated Segment Test), and so on. There is a module called features2d
that provides functions to detect and extract all these features. There is another module called xfeatures2d
that provides a few more feature extractors, some of which are still in the experimental phase. You can play around with these if you get a chance. There is also a module called bioinspired
that provides algorithms for biologically inspired Computer Vision models.
Object detection refers to detecting the location of an object in a given image. This process is not concerned with the type of object. If you design a chair detector, it will just tell you the location of the chair in a given image. It will not tell you whether it's a red chair with a high back or a blue chair with a low back. Detecting the location of objects is a very critical step in many Computer Vision systems. Consider the following image:
If you run a chair detector on this image, it will put a green box around all the chairs. It won't tell you what kind of chair it is! Object detection used to be a computationally intensive task because of the number of calculations required to perform the detection at various scales. To solve this, Paul Viola and Michael Jones came up with a great algorithm in their seminal paper in 2001. You can read it at https://www.cs.cmu.edu/~efros/courses/LBMV07/Papers/viola-cvpr-01.pdf. They provided a fast way to design an object detector for any object. OpenCV has modules called objdetect
and xobjdetect
that provide the framework to design an object detector. You can use it to develop detectors for random items such as sunglasses, boots, and so on.
Computer Vision uses various machine learning algorithms to achieve different things. OpenCV provides a module called ml
that has many machine learning algorithms bundled into it. Some of the algorithms include Bayes Classifier, K-Nearest Neighbors, Support Vector Machines, Decision Trees, Neural Networks, and so on. It also has a module called flann
that contains algorithms for fast-nearest-neighbor searches in large datasets. Machine learning algorithms are used extensively to build systems for object recognition, image classification, face detection, visual searches, and so on.
Computational photography
Computational photography refers to using advanced image processing techniques to improve the images captured by cameras. Instead of focusing on optical processes and image capture methods, computational photography uses software to manipulate visual data. Some applications include high dynamic range imaging, panoramic images, image relighting, light field cameras, and so on.
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. Instructions for running example are available in the README.md
file present in the root folder of each project.
Let's take a look at the following image:
Look at those vivid colors! This is an example of a high dynamic range image and it wouldn't be possible to get this using conventional image capture techniques. To do this, we have to capture the same scene at multiple exposures, register those images with each other, and then blend them nicely to create this image. The photo
and xphoto
modules contain various algorithms that provide algorithms pertaining to computational photography. There is a module called stitching
that provides algorithms to create panoramic images.
The notion of shape is crucial in Computer Vision. We analyze the visual data by recognizing various different shapes in the image. This is actually an important step in many algorithms. Let's say you are trying to identify a particular logo in an image. Now, you know that it can appear in various shapes, orientations, sizes, and so on. One good way to get started is to quantify the characteristics of the shape of the object. The module shape
provides all the algorithms required to extract different shapes, measure similarities between them, transform shapes of objects, and so on.
Optical flow algorithms are used in videos to track features across successive frames. Let's say you want to track a particular object in a video. Running a feature extractor on each frame would be computationally expensive; hence, the process would be slow. So, you just need to extract the features from the current frame and then track these features in successive frames. Optical flow algorithms are heavily used in video-based applications in Computer Vision. The optflow
module contains a number of algorithms required to perform optical flow. There is also a module called tracking
that contains more algorithms that can be used to track features.
Face and object recognition
Face recognition refers to identifying the person in a given image. This is not the same as face detection where you identify the location of a face in the given image. So, if you want to build a practical biometric system that can recognize the person in front of the camera, you first need to run the face detector that can identify the location of the face, and then, run a face recognizer that can recognize who that person is. There is a module called face
that deals with face recognition.
As discussed earlier, Computer Vision tries to model algorithms based on how humans perceive the visual data. So, it would be helpful to find salient regions and objects in the images that can help different applications, such as object recognition, object detection and tracking, and so on. There is a module called saliency
that's designed for this purpose. It provides algorithms that can detect salient regions in static images and videos.
We are increasingly interacting with devices that can capture the 3D structure of the objects around us. These devices basically capture the depth information along with the regular 2D color images. So, it's important for us to build algorithms that can understand and process 3D objects. Kinect is a good example of a device that captures the depth information along with the visual data. The task at hand is to recognize the input 3D object by matching it with one of the models in our database. If we have a system that can recognize and locate objects, then it can be used for many different applications. There is a module called surface_matching
that contains algorithms for 3D object recognition and a pose estimation algorithm using 3D features.
Text detection and recognition
Identifying text in a given scene and recognizing the content is becoming increasingly important. Some applications include nameplate recognition, recognizing road signs for self-driving cars, book scanning to digitize the contents, and so on. There is a module called text
that contains various algorithms to handle text detection and recognition.