Model cloning with LLMs using a secondary model
Chapter 8 explored model extraction and theft attacks on traditional LLMs. We identified physical theft as one attack vector. Although LLMs are very large, this attack remains a valid attack vector. Meta’s Llama was leaked by a researcher to the file download side, prompting the company to open access to the model.
We also discussed functional extraction that does not scale because of the model size, complexity, and shift toward knowledge distillation. We used generative adversarial networks (GANs) in adversarial games to train a clone that is a reasonable equivalent of the target model.
A similar shift has happened in LLMs, taking model theft attacks even further to train a base model using instruct training.
Instead of using GAN to drive the process, in LLMs, the attack exploits the target via its APIs to create instruct training datasets, which are then used to train. Stanford’s Alpaca was the first model to...