Poisoning attacks on fine-tuning LLMs
Data poisoning is still relevant to training foundation models. However, these have become specialized tasks, and we will consider them a supply-chain risk. Instead, for those developing LLM solutions, fine-tuning is the closest they get to model training.
Before we look at adversarial attacks on fine-tuning, let’s walk through how it works for LLMs.
Introduction to fine-tuning LLMs
Fine-tuning is a process used in ML, particularly in the context of LLMs, to adapt a pre-trained model to perform specific tasks or improve performance on datasets with peculiarities not covered during the initial training. This transfer learning (TL) method leverages the model’s learned representations, applying them to new but related problems.
Although, like RAG, it aims to tailor an LLM’s responses, fine-tuning differs from RAG. RAG focuses on bringing in your own data that the model had not seen during training or data that needs...