Convolution is a way of extracting features from an image that may allow us to more easily classify it based on known features. Before we get into convolution, let's first take a step back and understand why networks, and our vision for that matter, need to isolate features in an image. Take a look at the following; it's a sample image of a dog, called Sadie, with various image filters applied:
Example of an image with different filters applied
The preceding shows four different versions with no filter, edge detection, pixelate, and glowing edges filters applied. In all cases, though, you as a human can clearly recognize it is a picture of a dog, regardless of the filter applied, except note that in the edge detection case, we have eliminated the extra image data that is unnecessary to recognize a dog. By using a filter, we can extract just...