Defenses and mitigations
There is not a single measure that can, on its own, mitigate the threats we’ve described. Because the content and instructions are mixed in a single NLP input, preventing injection attacks can be daunting. Defenses and mitigations aim to reduce this risk and should be considered part of a defense-in-depth strategy.
Because of the current predominance of proprietary LLMs externally hosted by vendors such as OpenAI, Anthropic, Azure, AWS, and Google, defense in depth will have a shared responsibility with vendors, who are required to offer strong safety guarantees around model hosting and access. We will explore defenses and mitigations at two levels – LLM platform and LLM application – while assuming that the model vendors are responsible for the first one and we are responsible for the application level.
We assume more responsibility for the model level of safety measures for own hosted model scenarios.