Pruning – trimming the fat from LLMs
Pruning is an optimization technique used to streamline LLMs by systematically removing parameters (that is, weights) that have little to no impact on the output. The main objective is to create a leaner model that retains essential functionality while being more efficient to run. Let’s take a more detailed look at pruning.
The identification of redundant weights
The process of pruning a neural network, including LLMs, involves reducing the model’s complexity by removing weights that are considered less important for the model’s decision-making process. Here’s a deeper insight into how redundant weights are identified and managed:
- Weight magnitude: Typically, the magnitude of a weight in a neural network indicates its importance. Smaller weights (closer to zero) have less impact on the output of the network. Therefore, weights with the smallest absolute values are often considered first for pruning...