Toward a deep learning approach
While playing with handwritten digit recognition, we came to the conclusion that the closer we get to an accuracy of 99%, the more difficult it is to improve. If we want more improvement, we definitely need a new idea. What are we missing? Think about it.
The fundamental intuition is that in our examples so far, we are not making use of the local spatial structure of images, which means we will use the fact that an image can be described as a matrix with data locality. In particular, this piece of code transforms the bitmap representing each written digit into a flat vector where the local spatial structure (the fact that some pixels are closer to each other) is gone:
# X_train is 60000 rows of 28x28 values; we --> reshape it as in 60000 x 784.
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
However, this is not how our brain works. Remember that our vision is based on multiple cortex levels, each one recognizing...