During the inference, the whole neural network should be loaded into the memory, so as mobile developers we are especially interested in the small architectures, which consume as little memory as possible. Small neural networks also allow to reduce the bandwidth consumption when downloaded from the network.
Several architectures designed to reduce the size of convolutional neural networks have been proposed recently. We will discuss in brief several most known of them.