Packt+ | Advance your knowledge in tech

You're reading from Building Computer Vision Projects with OpenCV 4 and C++ Implement complex computer vision algorithms and explore deep learning and face detection

Product type Course

Published in Mar 2019

Publisher

ISBN-13 9781838644673

Length 538 pages

Edition 1st Edition

Languages

C++

Tools

OpenCV

Concepts

Computer Vision

Authors (4):

Roy Shilkrot

David Millán Escrivá

Vinícius G. Mendonça

Prateek Joshi

View More author details

Table of Contents (28) Chapters

Title Page

About Packt

Contributors

Preface

1. Getting Started with OpenCV FREE CHAPTER

2. An Introduction to the Basics of OpenCV

3. Learning Graphical User Interfaces

4. Delving into Histogram and Filters

5. Automated Optical Inspection, Object Segmentation, and Detection

6. Learning Object Classification

7. Detecting Face Parts and Overlaying Masks

8. Video Surveillance, Background Modeling, and Morphological Operations

9. Learning Object Tracking

10. Developing Segmentation Algorithms for Text Recognition

11. Text Recognition with Tesseract

12. Deep Learning with OpenCV

13. Cartoonifier and Skin Color Analysis on the RaspberryPi

14. Explore Structure from Motion with the SfM Module

15. Face Landmark and Pose with the Face Module

16. Number Plate Recognition with Deep Convolutional Networks

17. Face Detection and Recognition with the DNN Module

18. Android Camera Calibration and AR Using the ArUco Module

19. iOS Panoramas with the Stitching Module

20. Finding the Best OpenCV Algorithm for the Job

21. Avoiding Common Pitfalls in OpenCV

1. Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

How do humans understand image content?

If you look around, you will see a lot of objects. You encounter many different objects every day, and you recognize them almost instantaneously without any effort. When you see a chair, you don't wait for a few minutes before realizing that it is in fact a chair. You just know that it's a chair right away.

Computers, on the other hand, find it very difficult to do this task. Researchers have been working for many years to find out why computers are not as good as we are at this.

To get an answer to that question, we need to understand how humans do it. The visual data processing happens in the ventral visual stream. This ventral visual stream refers to the pathway in our visual system that is associated with object recognition. It is basically a hierarchy of areas in our brain that helps us recognize objects.

Humans can recognize different objects effortlessly, and can cluster similar objects together. We can do this because we have developed some sort of invariance toward objects of the same class. When we look at an object, our brain extracts the salient points in such a way that factors such as orientation, size, perspective, and illumination don't matter.

A chair that is double the normal size and rotated by 45 degrees is still a chair. We can recognize it easily because of the way we process it. Machines cannot do that so easily. Humans tend to remember an object based on its shape and important features. Regardless of how the object is placed, we can still recognize it.

In our visual system, we build up these hierarchical invariances with respect to position, scale, and viewpoint that help us to be very robust. If you look deeper into our system, you will see that humans have cells in their visual cortex that can respond to shapes such as curves and lines.

As we move further along our ventral stream, we will see more complex cells that are trained to respond to more complex objects such as trees, gates, and so on. The neurons along our ventral stream tend to show an increase in the size of the receptive field. This is coupled with the fact that the complexity of their preferred stimuli increases as well.

Why is it difficult for machines to understand image content?

We now understand how visual data enters the human visual system, and how our system processes it. The issue is that we still don't fully understand how our brain recognizes and organizes this visual data. In machine learning, we just extract some features from images, and ask the computers to learn them using algorithms. We still have these variations, such as shape, size, perspective, angle, illumination, occlusion, and so on.

For example, the same chair looks very different to a machine when you look at it from the profile view. Humans can easily recognize that it's a chair, regardless of how it's presented to us. So, how do we explain this to our machines?

One way to do this would be to store all the different variations of an object, including sizes, angles, perspectives, and so on. But this process is cumbersome and time-consuming. Also, it's actually not possible to gather data that can encompass every single variation. The machines would consume a huge amount of memory and a lot of time to build a model that can recognize these objects.

Even with all this, if an object is partially occluded, computers still won't recognize it. This is because they think this is a new object. So when we build a computer vision library, we need to build the underlying functional blocks that can be combined in many different ways to formulate complex algorithms.

OpenCV provides a lot of these functions, and they are highly optimized. So once we understand what OpenCV is capable of, we can use it effectively to build interesting applications.

Let's go ahead and explore that in the next section.

You're reading from Building Computer Vision Projects with OpenCV 4 and C++ Implement complex computer vision algorithms and explore deep learning and face detection

Table of Contents (28) Chapters

How do humans understand image content?

Why is it difficult for machines to understand image content?

Authors (4)

Personalised recommendations for you