Because they are so trusting, chatbots will follow instructions from hackers

Imagine that a chatbot is applying for a job as your personal assistant.

The pros: This chatbot is powered by a cutting-edge large language model. It can write your emails, search your files, summarize websites and converse with you.

The con: It will take orders from absolutely anyone.

AI chatbots are good at many things, but they struggle to tell the difference between legitimate commands from their users and manipulative commands from outsiders. It’s an AI Achilles’ heel, cybersecurity researchers say, and it’s only a matter of time before attackers take advantage of it.

Public chatbots powered by large language models, or LLMs, emerged just in the last year, and the field of LLM cybersecurity is in its early stages. But researchers have already found these models vulnerable to a type of attack called “prompt injection,” where bad actors sneakily present the model with commands. In some examples, attackers hide prompts inside webpages the chatbot later reads, tricking the chatbot into downloading malware, helping with financial fraud or repeating dangerous misinformation.

Authorities are taking notice: The Federal Trade Commission opened an investigation into ChatGPT creator OpenAI in July, demanding information including any known actual or attempted prompt injection attacks. Britain’s National Cyber Security Center published a warning in August naming prompt injection as a major risk to large language models. And this week, the White House issued an executive order asking AI developers to create tests and standards to measure the safety of their systems.

“The problem with [large language] models is that fundamentally they are incredibly gullible,” said Simon Willison, a software programmer who co-created the widely used Django web framework. Willison has been documenting his and other programmers’ warnings about and experiments with prompt injection.

“These models would believe anything anyone tells them,” he said. “They don’t have a good mechanism for considering the source of information.”

Here’s how prompt injection works and the potential fallout of a real-world attack.

What is prompt injection?

Prompt injection refers to a type of cyberattack against AI-powered programs that take commands in natural language rather than code. Attackers try to trick the program to do something its users or developers didn’t intend.

AI tools that access a user’s files or applications to perform some task on their behalf — like reading files or writing emails — are particularly vulnerable to prompt injection, Willison said.

Attackers might ask the AI tool to read and summarize confidential files, steal data or send reputation-harming messages. Rather than ignoring the command, the AI program would treat it like a legitimate request. The user may be unaware the attack took place.

So far, cybersecurity researchers aren’t aware of any successful prompt injection attacks other than publicized experiments, Willison said. But as excitement around personal AI assistants and other “AI agents” grows, so does the potential for a high-profile attack, he said.

How does a prompt injection attack happen?

Researchers and engineers have shared multiple examples of successful prompt injection attacks against major chatbots.

In a paper from this year, researchers hid adversarial prompts inside webpages before asking chatbots to read them. One chatbot interpreted the prompts as real commands. In one instance, the bot told its user they’d won an Amazon gift card in an attempt to steal credentials. In another, it took the user to a website containing malware.

Another paper from 2023 took a different approach: injecting bad prompts right into the chat interface. Through computer-powered trial and error, researchers at Carnegie Mellon University found strings of random words that, when fed to the chatbot, caused it to ignore its boundaries. The chatbots gave instructions for building a bomb, disposing of a body and manipulating the 2024 election. This attack method worked on OpenAI’s ChatGPT, Anthropic’s Claude, Google’s Bard and Meta’s Llama 2, the researchers found.

It’s tough to say why the model responds the way it does to the random string of words, said Andy Zou, one of the paper’s authors. But it doesn’t bode well.

“Our work is one of the early signs that current systems that are already deployed today aren’t super safe,” he said.

An OpenAI spokesman said the company is working to make its models more resilient against prompt injection. The company blocked the adversarial strings in ChatGPT after the researchers shared their findings.

A Google spokesman said the company has a team dedicated to testing its generative AI products for safety, including training models to recognize bad prompts and creating “constitutions” that govern responses.

An Anthropic spokeswoman said the company has teams working to make its models more resilient against prompt injection.

“The type of potentially problematic information referred to in this paper is already readily available on the internet,” a Meta spokesman said in a statement. “We determine the best way to release each new model responsibly.”

Is somebody going to fix this?

Software developers and cybersecurity professionals have created tests and benchmarks for traditional software to show it’s safe enough to use. Right now, the safety standards for LLM-based AI programs don’t measure up, said Zico Kolter, who wrote the prompt injection paper with Zou.

Software experts agree, however, that prompt injection is an especially tricky problem. One approach is to limit the instructions these models can accept, as well as the data they can access, said Matt Fredrikson, Zou and Kolter’s co-author. Another is to try to teach the models to recognize malicious prompts or avoid certain tasks. Either way, the onus is on AI companies to keep users safe, or at least clearly disclose the risks, Fredrikson said.

The question requires far more research, he said. But companies are rushing to build and sell AI assistants — and the more access these programs get to our data, the more potential for attacks.

Embra, an AI-assistant start-up that tried to build agents that would perform tasks on their own, recently stopped work in that area and narrowed its tools’ capabilities, founder Zach Tratar said on X.

“Autonomy + access to your private data = 🔥,” Tratar wrote.

Other AI companies may need to pump the brakes as well, said Willison, the programmer documenting prompt injection examples.

“It’s hard to get people to listen,” he said. “They’re like, ‘Yeah, but I want my personal assistant.’ I don’t think people will take it seriously until something harmful happens.”

Delete your digital history from dozens of companies with this app
Delete your digital history from dozens of companies with this app
Clear vs. TSA PreCheck: What’s better for price and privacy?
Clear vs. TSA PreCheck: What’s better for price and privacy?
Your printing service might read your documents. Here’s what to know.
Your printing service might read your documents. Here’s what to know.

Article information

Author: Kyle Scott

Last Updated: 1700335681

Rating: 4.1 / 5 (63 voted)

Reviews: 99% of readers found this page helpful

Author information

Name: Kyle Scott

Birthday: 1954-07-14

Address: 385 Tyler Mountains, Port Shane, LA 62752

Phone: +3959105886207316

Job: Article Writer

Hobby: Horseback Riding, Origami, Fishing, Magic Tricks, Poker, Wine Tasting, Camping

Introduction: My name is Kyle Scott, I am a lively, esteemed, dazzling, resolute, honest, treasured, enterprising person who loves writing and wants to share my knowledge and understanding with you.

Because they are so trusting, chatbots will follow instructions from hackers

Sources

blogintegrity top Ad

blogintegrity Top Articles

Die KI-Firma von Elon Musk veröffentlicht ihren eigenen Chatbot

Nach dem Verlust des Pokals äußert ein Freund seine Wut: "So kannst du bei keinem Gegner antreten!"

Doğru Karın Kası Egzersizleri ile Yaza Hazırlıklı Olun! | OKU Haber Dergi

Instructions pour annuler un mariage

Zwischen Kryptographie und KI: Ein Überblick über die Zukunft digitaler Innovationen

These are the modern hate symbols | CNN

The Deployment of the Ethereum ERC-4337 Token Standard: What Does It Mean for ETH Users?

blogintegrity Latest Articles

Boonop word betaling na reservasies voor Booking deur die kassa gedoen

Is a 16 percent drop in price expected for Cardano (ADA) as its price loses key support?

These eight incredible devices will make your car the ultimate Batmobile

The Top Rated and Reviewed Stablecoins for 2024

What is Ethereum Staking and How Can I Stake?

blogintegrity bottom Ad

NAVIGATION

DISCOVER

Contact Us