Indirect prompt injection
Indirect prompt injection is embedded into the external content we include in our prompts. For instance, attackers can embed adversarial prompts into the web pages in a hidden form when using the internet.
Two different researchers have demonstrated how to embed hidden prompt injection in web content. Arvind Narayanan from Princeton University demonstrated the use of a white font color to hide a benign payload:
Figure 14.13 – Using a white font color to disguise indirect prompt injection. Source: https://x.com/random_walker/status/1636923058370891778/photo/2
Embrace-The-Red researchers tweaked the payload so that it uses style and combines both the color and a font size of 1px to hide an injection attack that uses emoticons to stage a DoS attack when a user attempts to summarize the page using Bing Chat’s ChatGPT integration:
<div style="color:white;font-size: 1px"> <br>AI Injection</br...