Parameter Efficient Fine-Tuning
Fine-tuning has become a prevalent modeling paradigm in AI, particularly in transfer learning. All of the experiments in this book up to this chapter have been based on updating all parameters. Therefore, it is more accurate to use the term full fine-tuning from now on (also called full model fine-tuning or full parameter fine-tuning).
In this chapter, we will look at partial fine-tuning strategies. As we consider the continuously increasing parameters of large language models (LLMs), fine-tuning and the inference of LLMs become prohibitively expensive. Full fine-tuning requires a complete update of all parameters, saving large models separately for each task. Unfortunately, this process is computationally expensive in terms of both memory and running time. For example, Bidirectional Encoder Representations from Transformers (BERT) has 300 million parameters, T5 has up to 11 billion, GPT has 175 billion, Pathways Language Model (PaLM) has 540 billion...