Other techniques exist but can be harder to implement. There is no straightforward way to apply them because they rely mostly on trial and error.
The first one, channel pruning, consists of removing some convolutional filters or some channels. Convolutional layers usually have between 16 and 512 different filters. At the end of the training phase, it often appears that some of them are not useful. We can remove them to avoid storing weights that will not help the model performance.
The second one is called weight sparsification. Instead of storing weights for the whole matrix, we can store only the ones that are deemed important or not close to zero.
For instance, instead of storing a weight vector such as [0.1, 0.9, 0.05, 0.01, 0.7, 0.001], we could keep weights that are not close to zero. The result is a list of tuples in the form (position, value). In our example, it would be [(1, 0.9), (4, 0.7)]. If many of the vector's values...