Let's say we have scanned images of several handwritten digits and want to make a piece of software that would recognize handwritten digits from an image scan. For simplicity, let's assume that we have only one digit. The target software that we develop takes in this image and outputs a number corresponding to that image. We can create an algorithm with several checks, such as: if there is a single vertical line, then output it as 1, or if there is an oval shape, then show it as zero. However, this is very naive and is a bad solution because we can have vertical lines for other digits too: 7, 9, and so on. The following figure explains the overall process, taking in one of the samples from the MNIST handwritten digit dataset:
There are several ways to model such a problem. We know that an image is made up of arrays of pixels and each pixel...