As large language models (LLMs) like OpenAI's ChatGPT, Auto-GPT, Google's Bard, and Microsoft's Bing and Office 365 Co-Pilot become increasingly integrated into our digital infrastructure, their vulnerabilities must be taken seriously. Developer Simon Willison warns that these AI assistants, which can write emails, organize appointments, search the web, and more, could quickly become a weak link in cybersecurity.
Prompt Injections: A Dangerous Vulnerability
One significant threat comes from prompt injections, where humans manipulate the language models to ignore their original instructions and execute new ones. Once LLMs are connected to external databases and email clients, the consequences of such attacks can be far more dangerous, and the vulnerability of systems much greater than we think.
Willison explains that prompt injections become a genuinely dangerous vulnerability when LLMs are given the ability to trigger additional tools, make API requests, run searches, or even execute generated code in an interpreter or a shell. For example, an AI assistant using the ChatGPT API could be instructed to forward sensitive emails to a malicious actor and then delete them.
Current attempts to filter such attacks are a dead end, according to Willison. He also highlights the risk of "poisoning" search results, as demonstrated by researcher Mark Riedl, who tricked Microsoft's Bing into portraying him as a time travel expert.
Database Leaks and Copy & Paste Attacks
Another major danger lies in the numerous possible attacks once users give an AI assistant access to multiple systems. Willison provides an example where an email attack could cause ChatGPT to extract valuable customer data from a database and hide it in a URL that is delivered without comment. When the user clicks on the link, private data is sent to the attacker's website.
Developer Roman Samoilenko demonstrated a similar attack using simple copy and paste to inject a malicious ChatGPT prompt into copied text. The malicious prompt asks ChatGPT to append a small single-pixel image to its answer and add sensitive chat data as an image URL parameter. This can result in sensitive data being sent to an attacker's remote server.
Protecting Against Prompt Injections
Willison acknowledges that OpenAI is likely considering such attacks and suggests several ways to reduce vulnerability to prompt injections. One approach is to expose the prompts, allowing users to spot potential injection attacks. Another is to require AI assistants to ask users for permission to perform certain actions, such as showing an email before sending it. However, neither approach is foolproof.
Ultimately, the best protection against prompt injections is ensuring developers understand the risks. Willison concludes by urging developers to consider how they are taking prompt injection into account when working with large language models.
Key Takeaways
- Large language models like ChatGPT are becoming integrated into our digital infrastructure, increasing cybersecurity concerns.
- Prompt injections pose a significant threat, with potentially dangerous consequences when LLMs are connected to external databases and email clients.
- Database leaks and copy & paste attacks are additional risks when AI assistants have access to multiple systems.
- To protect against prompt injections, developers must understand the risks and consider ways to reduce vulnerabilities, such as exposing prompts or requiring user permissions for certain actions.