Adversarial Attacks with Prompts
In the previous chapter, we started looking at LLMs and how they change AI application development workflows and Adversarial AI. We looked at the evolution sparked by ChatGPT and the paradigm shift toward accessing external hosts via APIs rather than the model directly. With classic model development now done by specialist LLM developer organizations, the solution’s focus has shifted to sending inputs and outputs to the model using API calls.
As we’ll see, these calls use prompts, free-text inputs, mixing content, and instructions for the model and return similarly mixed content as output. These mixed inputs and outputs create new attack vectors for Adversarial AI, such as prompt injection, a term almost synonymous with LLMs.
In this chapter, we will cover the following topics:
- Adversarial inputs with direct prompt injection and the various techniques and approaches to craft adversarial prompts to jailbreak LLM safety controls...