Search icon CANCEL
Subscription
0
Cart icon
Cart
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Practical Computer Vision
Practical Computer Vision

Practical Computer Vision: Extract insightful information from images using TensorFlow, Keras, and OpenCV

By Abhinav Dadhich
$29.99
Book Feb 2018 234 pages 1st Edition
eBook
$29.99
Print
$38.99
Subscription
$15.99 Monthly
eBook
$29.99
Print
$38.99
Subscription
$15.99 Monthly

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now
Table of content icon View table of contents Preview book icon Preview Book

Practical Computer Vision

A Fast Introduction to Computer Vision

Computer vision applications have become quite ubiquitous in our lives. The applications are varied, ranging from apps that play Virtual Reality (VR) or Augmented Reality (AR) games to applications for scanning documents using smartphone cameras. On our smartphones, we have QR code scanning and face detection, and now we even have facial recognition techniques. Online, we can now search using images and find similar looking images. Photo sharing applications can identify people and make an album based on the friends or family found in the photos. Due to improvements in image stabilization techniques, even with shaky hands, we can create stable videos.

With the recent advancements in deep learning techniques, applications like image classification, object detection, tracking, and so on have become more accurate and this has led to the development of more complex autonomous systems, such as drones, self-driving cars, humanoids, and so on. Using deep learning, images can be transformed into more complex details; for example, images can be converted into Van Gogh style paintings.

Such progress in several domains makes a non-expert wonder, how computer vision is capable of inferring this information from images. The motivation lies in human perception and the way we can perform complex analyzes of the environment around us. We can estimate the closeness of, structure and shape of objects, and estimate the textures of a surface too. Even under different lights, we can identify objects and even recognize something if we have seen it before.

Considering these advancements and motivations, one of the basic questions that arises is what is computer vision? In this chapter, we will begin by answering this question and then provide a broader overview of the various sub-domains and applications within computer vision. Later in the chapter, we will start with basic image operations.

What constitutes computer vision?

In order to begin the discussion on computer vision, observe the following image:

Even if we have never done this activity before, we can clearly tell that the image is of people skiing in the snowy mountains on a cloudy day. This information that we perceive is quite complex and can be sub divided into more basic inferences for a computer vision system.

The most basic observation that we can get from an image is of the things or objects in it. In the previous image, the various things that we can see are trees, mountains, snow, sky, people, and so on. Extracting this information is often referred to as image classification, where we would like to label an image with a predefined set of categories. In this case, the labels are the things that we see in the image.

A wider observation that we can get from the previous image is landscape. We can tell that the image consists of Snow, Mountain, and Sky, as shown in the following image:

Although it is difficult to create exact boundaries for where the Snow, Mountain, and Sky are in the image, we can still identify approximate regions of the image for each of them. This is often termed as segmentation of an image, where we break it up into regions according to object occupancy.

Making our observation more concrete, we can further identify the exact boundaries of objects in the image, as shown in the following figure:

In the image, we see that people are doing different activities and as such have different shapes; some are sitting, some are standing, some are skiing. Even with this many variations, we can detect objects and can create bounding boxes around them. Only a few bounding boxes are shown in the image for understanding—we can observe much more than these.

While, in the image, we show rectangular bounding boxes around some objects, we are not categorizing what object is in the box. The next step would be to say the box contains a person. This combined observation of detecting and categorizing the box is often referred to as object detection.

Extending our observation of people and surroundings, we can say that different people in the image have different heights, even though some are nearer and others are farther from the camera. This is due to our intuitive understanding of image formation and the relations of objects. We know that a tree is usually much taller than a person, even if the trees in the image are shorter than the people nearer to the camera. Extracting the information about geometry in the image is another sub-field of computer vision, often referred to as image reconstruction.

Computer vision is everywhere

In the previous section, we developed an initial understanding of computer vision. With this understanding, there are several algorithms that have been developed and are used in industrial applications. Studying these not only improve our understanding of the system but can also seed newer ideas to improve overall systems.

In this section, we will extend our understanding of computer vision by looking at various applications and their problem formulations:

  • Image classification: In the past few years, categorizing images based on the objects within has gained popularity. This is due to advances in algorithms as well as the availability of large datasets. Deep learning algorithms for image classification have significantly improved the accuracy while being trained on datasets like ImageNet. We will study this dataset further in the next chapter. The trained model is often further used to improve other recognition algorithms like object detection, as well as image categorization in online applications. In this book, we will see how to create a simple algorithm to classify images using deep learning models.
  • Object detection: Not just self-driving cars, but robotics, automated retail stores, traffic detection, smartphone camera apps, image filters and many more applications use object detection. These also benefit from deep learning and vision techniques as well as the availability of large, annotated datasets. We saw an introduction to object detection in the previous section that produces bounding boxes around objects and also categorize what object is inside the box.
  • Object tracking: Following robots, surveillance cameras and people interaction are few of the several applications of object tracking. This consists of defining the location and keeps track of corresponding objects across a sequence of images.
  • Image geometry: This is often referred to as computing the depth of objects from the camera. There are several applications in this domain too. Smartphones apps are now capable of computing three-dimensional structures from the video created onboard. Using the three-dimensional reconstructed digital models, further extensions like AR or VR application are developed to interface the image world with the real world.
  • Image segmentation: This is creating cluster regions in images, such that one cluster has similar properties. The usual approach is to cluster image pixels belonging to the same object. Recent applications have grown in self-driving cars and healthcare analysis using image regions.
  • Image generation: These have a greater impact in the artistic domain, merging different image styles or generating completely new ones. Now, we can mix and merge Van Gogh's painting style with smartphone camera images to create images that appear as if they were painted in a similar style to Van Gogh's.

The field is quickly evolving, not only through making newer methods of image analysis but also finding newer applications where computer vision can be used. Therefore, applications are not just limited to those explained previously.

Developing vision applications requires significant knowledge of tools and techniques. In Chapter 2, Libraries, Development Platform, and Datasets, we will see a list of tools that helps in implementing vision techniques. One of the popular tools for this is OpenCV, which consists of most common algorithms of computer vision. For more recent techniques such as deep learning, Keras and TensorFlow can be used in creating applications.

Though we will see an introductory image operations in the next section, in Chapter 3, Image Filtering and Transformations in OpenCV, there are more elaborate image operations of filtering and transformations. These act as initial operations in many applications to remove unwanted information.

In Chapter 4, What is a Feature?, we will be introduced to the features of an image. There are several properties in an image such as corners, edges, and so on that can act as key points. These properties are used to find similarities between images. We will implement and understand common features and feature extractors.

The recent advances in vision techniques for image classification or object detection use advanced features that utilize deep-learning-based approaches. In Chapter 5, Convolutional Neural Networks, we will begin with understanding various components of a convolutional neural network and how it can be used to classify images.

Object detection, as explained before, is a more complex problem of both localizing the position of an object in an image as well as saying what type of object it is. This, therefore, requires more complex techniques, which we will see in Chapter 6, Feature-Based Object Detection, using TensorFlow.

If we would like to know the region of an object in an image, we need to perform image segmentation. In Chapter 7, Segmentation and Tracking, we will see some techniques for image segmentation using convolutional neural networks and also techniques for tracking multiple objects in a sequence of images or video.

Finally in Chapter 8, 3D Computer Vision, there is an introduction to image construction and an application of image geometry, such as visual odometry and visual slam.

Though we will introduce setting up OpenCV in the next chapter in detail, in the next section we will use OpenCV to perform basic image operations of reading and converting images. These operations will show how an image is represented in the digital world and what needs to be changed to improve image quality. More detailed image operations are covered in Chapter 3, Image Filtering and Transformations in OpenCV.

Getting started

In this section, we will see basic image operations for reading and writing images. We will also see how images are represented digitally.

Before we proceed further with image IO, let's see what an image is made up of in the digital world. An image is simply a two-dimensional array, with each cell of the array containing intensity values. A simple image is a black and white image with 0's representing white and 1's representing black. This is also referred to as a binary image. A further extension of this is dividing black and white into a broader grayscale with a range of 0 to 255. An image of this type, in the three-dimensional view, is as follows, where x and y are pixel locations and z is the intensity value:

This is a top view, but on viewing sideways we can see the variation in the intensities that make up the image:

We can see that there are several peaks and image intensities that are not smooth. Let's apply smoothing algorithm, the details for which can be seen in Chapter 3, Image Filtering and Transformations in OpenCV:

As we can see, pixel intensities form more continuous formations, even though there is no significant change in the object representation. The code to visualize this is as follows (the libraries required to visualize images are described in detail in the Chapter 2, Libraries, Development Platforms, and Datasets, separately):

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import cv2


# loads and read an image from path to file
img = cv2.imread('../figures/building_sm.png')

# convert the color to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# resize the image(optional)
gray = cv2.resize(gray, (160, 120))

# apply smoothing operation
gray = cv2.blur(gray,(3,3))

# create grid to plot using numpy
xx, yy = np.mgrid[0:gray.shape[0], 0:gray.shape[1]]

# create the figure
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_surface(xx, yy, gray ,rstride=1, cstride=1, cmap=plt.cm.gray,
linewidth=1)
# show it
plt.show()

This code uses the following libraries: NumPy, OpenCV, and matplotlib.

In the further sections of this chapter we will see operations on images using their color properties. Please download the relevant images from the website to view them clearly.

Reading an image

An image, stored in digital format, consists of grid structure with each cell containing a value to represent image. In further sections, we will see different formats for images. For each format, the values represented in the grid cells will have different range of values.

To manipulate an image or use it for further processing, we need to load the image and use it as grid like structure. This is referred to as image input-output operations and we can use OpenCV library to read an image, as follows. Here, change the path to the image file according to use:

import cv2 

# loads and read an image from path to file
img = cv2.imread('../figures/flower.png')

# displays previous image
cv2.imshow("Image",img)

# keeps the window open until a key is pressed
cv2.waitKey(0)

# clears all window buffers
cv2.destroyAllWindows()

The resulting image is shown in the following screenshot:

Here, we read the image in BGR color format where B is blue, G is green, and R is red. Each pixel in the output is collectively represented using the values of each of the colors. An example of the pixel location and its color values is shown in the previous figure bottom.

Image color conversions

An image is made up pixels and is usually visualized according to the value stored. There is also an additional property that makes different kinds of image. Each of the value stored in a pixel is linked to a fixed representation. For example, a pixel value of ten can represent gray intensity value ten or blue color intensity value 10 and so on. It is therefore important to understand different color types and their conversion. In this section, we will see color types and conversions using OpenCV:

  • Grayscale: This is a simple one channel image with values ranging from 0 to 255 that represent the intensity of pixels. The previous image can be converted to grayscale, as follows:
import cv2 

# loads and read an image from path to file
img = cv2.imread('../figures/flower.png')

# convert the color to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# displays previous image
cv2.imshow("Image",gray)

# keeps the window open until a key is pressed
cv2.waitKey(0)

# clears all window buffers
cv2.destroyAllWindows()

The resulting image is as shown in the following screenshot:

  • HSV and HLS: These are another representation of color representing H is hue, S is saturation, V is value, and L is lightness. These are motivated by the human perception system. An example of image conversion for these is as follows:
# convert the color to hsv 
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

# convert the color to hls
hls = cv2.cvtColor(img, cv2.COLOR_BGR2HLS)

This conversion is as shown in the following figure, where an input image read in BGR format is converted to each of the HLS (on left) and HSV (on right) color types:

  • LAB color space: Denoted L for lightness, A for green-red colors, and B for blue-yellow colors, this consists of all perceivable colors. This is used to convert between one type of color space (for example, RGB) to others (such as CMYK) because of its device independence properties. On devices where the format is different to that of the image that is sent, the incoming image color space is first converted to LAB and then to the corresponding space available on the device. The output of converting an RGB image is as follows:

Computer vision research conferences

Some of the conferences to look for latest research and applications are as follows:

  • CVPR: Conference on Computer Vision and Pattern Recognition is held every year and is one of the popular conferences with research papers ranging from both theory and application across a wide domain
  • ICCV: International Conference on Computer Vision is another major conference held every other year attracting one of the best research papers
  • SIGGRAPH: Special Interest Group on Computer Graphics and interactive techniques though more on computer graphics domain has several applications papers that utilizes computer vision techniques.

Other notable conferences include Neural Information Processing Systems (NIPS), International Conference on Machine Learning (ICML), Asian Conference on Computer Vision (ACCV), European Conference on Computer Vision (ECCV), and so on.

Summary

In this chapter, we saw a brief overview of computer vision with basic IO operations on images. Though it is a vast field, there are always exciting applications that can be built using computer vision techniques. This book tries to bridge the gap between theory and a practical approach to several of the popular algorithms. Further, in this book, we will begin with understanding more basic image operations that can perform filtering and transformations. Extending these basic techniques, we will then see what comprises of feature and how to compute them.

Following the introduction to computer vision in this chapter, we will start setting up libraries and an environment in the next chapter. These libraries will be used across the book. The datasets introduced in the next chapter can be the starting point for several algorithms further.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Master the different tasks associated with Computer Vision and develop your own Computer Vision applications with ease
  • Leverage the power of Python, Tensorflow, Keras, and OpenCV to perform image processing, object detection, feature detection and more
  • With real-world datasets and fully functional code, this book is your one-stop guide to understanding Computer Vision

Description

In this book, you will find several recently proposed methods in various domains of computer vision. You will start by setting up the proper Python environment to work on practical applications. This includes setting up libraries such as OpenCV, TensorFlow, and Keras using Anaconda. Using these libraries, you'll start to understand the concepts of image transformation and filtering. You will find a detailed explanation of feature detectors such as FAST and ORB; you'll use them to find similar-looking objects. With an introduction to convolutional neural nets, you will learn how to build a deep neural net using Keras and how to use it to classify the Fashion-MNIST dataset. With regard to object detection, you will learn the implementation of a simple face detector as well as the workings of complex deep-learning-based object detectors such as Faster R-CNN and SSD using TensorFlow. You'll get started with semantic segmentation using FCN models and track objects with Deep SORT. Not only this, you will also use Visual SLAM techniques such as ORB-SLAM on a standard dataset. By the end of this book, you will have a firm understanding of the different computer vision techniques and how to apply them in your applications.

What you will learn

•Learn the basics of image manipulation with OpenCV •Implement and visualize image filters such as smoothing, dilation, histogram equalization, and more •Set up various libraries and platforms, such as OpenCV, Keras, and Tensorflow, in order to start using computer vision, along with appropriate datasets for each chapter, such as MSCOCO, MOT, and Fashion-MNIST •Understand image transformation and downsampling with practical implementations. •Explore neural networks for computer vision and convolutional neural networks using Keras •Understand working on deep-learning-based object detection such as Faster-R-CNN, SSD, and more •Explore deep-learning-based object tracking in action •Understand Visual SLAM techniques such as ORB-SLAM

Product Details

Country selected

Publication date : Feb 5, 2018
Length 234 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781788297684
Category :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Feb 5, 2018
Length 234 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781788297684
Category :

Table of Contents

12 Chapters
Preface Chevron down icon Chevron up icon
1. A Fast Introduction to Computer Vision Chevron down icon Chevron up icon
2. Libraries, Development Platform, and Datasets Chevron down icon Chevron up icon
3. Image Filtering and Transformations in OpenCV Chevron down icon Chevron up icon
4. What is a Feature? Chevron down icon Chevron up icon
5. Convolutional Neural Networks Chevron down icon Chevron up icon
6. Feature-Based Object Detection Chevron down icon Chevron up icon
7. Segmentation and Tracking Chevron down icon Chevron up icon
8. 3D Computer Vision Chevron down icon Chevron up icon
9. Mathematics for Computer Vision Chevron down icon Chevron up icon
10. Machine Learning for Computer Vision Chevron down icon Chevron up icon
11. Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Top Reviews
No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.