Understanding the importance of capsule networks
Convolutional neural networks (CNNs) form the backbone of all the major breakthroughs in image detection today. CNNs work by detecting the basic features that are present in the lower layers of the network and then proceed to detect the higher level features present in the higher layers of the network. This setup does not contain a pose (translational and rotational) relationship between the lower-level features that make up any complex object.
Imagine trying to identify a face. In this case, just having eyes, nose, and ears in an image can lead a CNN to conclude that it's a face without caring about the relative orientation of the concerned objects. To explain this further, if an image has a nose above the eyes, CNNs still can detect that it's an image. CNNs take care of this problem by using max pooling, which helps increase the field of view for the higher layers. However, this operation is not a perfect solution as we tend to lose valuable...