Capsule networks (CapsNets) were introduced by Geoffrey Hinton to overcome the limitations of convolutional networks.
Hinton stated the following:
"The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster."
But what is wrong with the pooling operation? Remember when we used the pooling operation to reduce the dimension and to remove unwanted information? The pooling operation makes our CNN representation invariant to small translations in the input.
This translation invariance property of a CNN is not always beneficial, and can be prone to misclassifications. For example, let's say we need to recognize whether an image has a face; the CNN will look for whether the image has eyes, a nose, a mouth, and ears. It does not care about which location they are in. If it finds all such...