Chapter 4. Computer Vision
Computer vision is the stream of engineering that gives eyes to a computer. It powers all sorts of image processing, such as face recognition in an iPhone, Google Lens, and so on. Computer vision has been around for decades and is probably best explored with the help of artificial intelligence, which will be demonstrated in this chapter.
We reached human accuracy in computer vision years ago in the ImageNet challenge. Computer vision has gone through an enormous amount of change in the last decade, from being an academically oriented object detection problem to a segmentation problem used by self-driving cars on real roads. Although people had come up with many different network architectures to solve computer vision, convolutional neural networks (CNNs) beat all of them.
In this chapter, we will discuss basic CNNs built on PyTorch and variants of them that have been successfully used in some state-of-the-art models powering several applications...