A lot of work has been done over the years in using neural networks to perform image recognition, and along the way, a technique called the convolutional neural network was developed to provide a better method of identifying features in images. This technique works by running a convolution step across the image, as shown in the following diagram:
CNN operation extracting features from an image
What is happening here is that a convolution matrix set by stride is multiplied across the image using a convolution step in order to generate a feature map. We do this in order to isolate features in an image by isolating sections of pixels and applying a grouping filter. If we didn't do this, our network would evaluate the image's raw pixels, which would make recognizing important features in an image difficult. It's not unlike looking at...