Adversarial inputs and prompt injection
When we covered Predictive AI, we looked at using perturbations in inputs to fool a model. In LLMs, the equivalent of this is prompt injection. OWASP defines prompt injection as manipulating “a large language model (LLM) through crafted inputs, causing the LLM to execute the attacker’s intentions unknowingly.” What does unknowingly mean in this context? Foundational models implement safety measures to ensure models align with ethical guidelines and societal norms, which prevents attackers from abusing the model to produce harmful or abusive content or sandbox it to avoid accessing sensitive information or other systems it shouldn’t. We will look at safety measures in more detail later in this chapter, but they typically include data privacy restrictions on data, content filtering via data cleaning, bias, and sensitivity training to implement ethical guidelines, runtime contextual NLP, and keyword detection, restriction...