Inference attacks on LLMs
In Chapter 9, we defined inference attacks as the adversarial inference of the following:
- Whether an individual was included in model training data (membership inference attacks or MIAs),
- A specific attribute of an individual and a group of samples at large (attribute inference)
Although LLMs can be used as tools to derive information about individuals using RAG (web search), some academic research shows that LLMs can be a challenging proposition for MIAs. The research paper Do Membership Inference Attacks Work on Large Language Models? was published in 2024 by Duan, Suri et al. at https://arxiv.org/abs/2402.07841.
The researchers used Pile
, an 825 GB open source dataset for training LLMs, and found that MIAs in their settings were barely better than random selection. They concluded that MIAs on LLMs are challenging and often performed near-randomly. They suggest two possibilities for this difficulty:
- Large datasets and single...