In this chapter, we introduce the first of various specialized deep learning architectures that we will cover in part four. Deep Convolutional Neural Networks (CNNs), also known as ConvNets, have enabled superhuman performance in classifying images, video, speech, and audio. Recurrent nets, the subject of the following chapter, have performed exceptionally well on sequential data such as text and speech.
CNNs are named after the linear algebra operation called convolution, which replaces the general matrix multiplication typical of feedforward networks (see Chapter 16, Deep Learning) in at least one of their layers. We will discuss how convolutions work, and why they are particularly useful to data with a certain regular structure, such as images and time series.
Research into CNN architectures has proceeded very rapidly, and new architectures that...