Exploring SFT and its techniques
SFT consists of re-training pre-trained models on a smaller dataset composed of pairs of instructions and answers. The goal of SFT is to turn a base model, which can only perform next-token prediction, into a useful assistant, capable of answering questions and following instructions. SFT can also be used to improve the general performance of the base model (general-purpose SFT), instill new knowledge (e.g., new languages, domains, etc.), focus on specific tasks, adopt a particular voice, and so on.
In this section, we will discuss when to use fine-tuning and explore related concepts with storage formats and chat templates. Finally, we will introduce three popular ways of implementing SFT: full-finetuning, Low-Rank Adaptation (LoRA) and Quantization-aware Low-Rank Adaptation (QLoRA).
When to fine-tune
In most scenarios, it is recommended to start with prompt engineering instead of directly fine-tuning models. Prompt engineering can be used...