At the heart of the MobileNets architecture lies the concept of depth-wise separable convolution. The standard convolution operations of CNNs are substituted by depth-wise convolution and point-wise convolution. So, let's first see what depth-wise separable convolution is in the next sub-section.
Architecture of MobileNets
Depth-wise separable convolution
As the name suggests, depth-wise separable convolution must have something to do with the depths of feature maps rather than their width and height. Remember that when we used a filter over the input image in a CNN, the filter covered all the channels of the image (say the three RGB channels of the colored image). No matter how many channels were present in the input...