Network pruning – eliminating unnecessary connections within the network
Network pruning is an optimization process that eliminates unnecessary connections. This technique can be applied after training, but it can also be applied during training where the decrease in model accuracy can be further reduced. With fewer connections, fewer weights are necessary. As a result, we can reduce the model size as well as the inference latency. In the following sections, we will present how to apply network pruning in TF and PyTorch.
Network pruning in TensorFlow
Like model quantization and weight sharing, network pruning for TF is available through TensorFlow Model Optimization Toolkit. Therefore, the first thing you need for network pruning is to import the toolkit with the following line of code:
import tensorflow_model_optimization as tfmot
To apply network pruning during training, you must modify your model using the tfmot.sparsity.keras.prune_low_magnitude
function: