Security Spotlight, News

Major AI Vulnerability Exposed: Single Prompt Grants Full Control

Researchers uncovered a major AI vulnerability allowing attackers to bypass safeguards with a single prompt, gaining control over AI systems to generate dangerous content.

Andrew Doyle
April 28, 2025

Table of Contents

Add a header to begin generating the table of contents

New Vulnerability Allows Malicious Content Generation Across AI Models

Researchers from HiddenLayer have discovered a major vulnerability in large language models (LLMs), where a single, universal prompt can trick chatbots into generating dangerous or malicious content. This vulnerability affects some of the most widely used LLMs, including ChatGPT, Gemini, Copilot, Claude, Llama, DeepSeek, Qwen, and Mistral.

The technique, called “Policy Puppetry Prompt Injection,” exploits weaknesses in how these models are trained on instruction or policy-related data, making them vulnerable to prompt injection attacks. Researchers found that with just one prompt, attackers could prompt AI systems to provide instructions on dangerous activities, including how to enrich uranium or make bombs and illegal substances.

The Mechanics of the Attack

The Policy Puppetry Prompt Injection attack relies on a few key tactics:

Policy File Formatting: The prompt is crafted like a policy file, such as XML, INI, or JSON, which tricks the LLM into overriding its safety protocols.
Leetspeak: To ensure the model complies with more dangerous requests, attackers may use leetspeak (replacing letters with numbers or symbols), which makes it harder for the AI to recognize the malicious intent.
Roleplaying Technique: The attack may also include instructions for the AI to “adopt” a fictional role, allowing it to bypass content restrictions designed to prevent harmful content generation.

While LLMs are specifically trained to reject harmful requests related to CBRN threats, violence, and self-harm, these attacks allow bypassing even the most robust safeguards.

Widespread Impact Across AI Models

Researchers tested the vulnerability across various AI models, including advanced systems like Gemini 2.5 and ChatGPT. Even when these models are trained with more sophisticated filters and safety measures, the vulnerability remains a serious issue. The ability to subvert these safeguards through a simple prompt demonstrates the need for additional monitoring and security measures.

The study highlights that this vulnerability is universal across multiple models, allowing attackers to control any model with minimal technical knowledge. The risk is significant: anyone with access to a keyboard could potentially exploit this vulnerability to instruct AI systems on how to generate dangerous content.

The Need for Better Security Measures

The researchers warn that external monitoring is essential to detect and respond to these types of attacks in real time. Current AI models are incapable of self-monitoring dangerous content, making them reliant on external systems to manage and detect malicious prompt injections.

Given the widespread applicability of this attack, it’s clear that security tools and detection methods need to evolve to protect these systems from exploitation.

Trending

Daily Briefing Newsletter

Subscribe to the Daily Security Review Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Mitchell Langley
June 5, 2025

SimpleHelp Exploit Fallout: Ransomware Hits Utility Billing Platforms

Andrew Doyle
June 16, 2025

TeamFiltration and Token Theft: The Cyber Campaign Microsoft Never Saw Coming

Andrew Doyle
June 16, 2025

64 Million T-Mobile Customer Records Allegedly Exposed in New Data Leak

Syed Arslan
June 16, 2025

Debt Collection Breach at CCC Exposes Data of Over 9 Million Americans

Mitchell Langley
June 16, 2025

Yes24 Ransomware Attack Disrupts South Korea’s Entertainment Industry, Exposes Millions to Risk

Andrew Doyle
June 16, 2025

Cyberattack Disrupts WestJet Internal Systems, Airline Investigating with Authorities

Mitchell Langley
June 16, 2025

Security Spotlight, News

Major AI Vulnerability Exposed: Single Prompt Grants Full Control

New Vulnerability Allows Malicious Content Generation Across AI Models

The Mechanics of the Attack

Widespread Impact Across AI Models

The Need for Better Security Measures

Debt Collection Breach at CCC Exposes Data of Over 9 Million Americans

Yes24 Ransomware Attack Disrupts South Korea’s Entertainment Industry, Exposes Millions to Risk

Cyberattack Disrupts WestJet Internal Systems, Airline Investigating with Authorities

Three CVEs, One Risk: Arbitrary Code Execution in Nessus Agent for Windows

WestJet Cyberattack: Cracks in Aviation’s Digital Armor

Victoria’s Secret Restores Critical Systems Following Cyberattack That Delayed Q1 Earnings

Over 46,000 Grafana Instances Still Vulnerable to ‘Grafana Ghost’ Account Takeover Bug

Silent Surveillance: The Hidden Risks in 40,000+ Unsecured Cameras

Paragon’s Promise vs. Reality: How Graphite Is Being Used Against Journalists and Activists

zeroRISC Secures $10M to Commercialize OpenTitan and Reinvent Supply Chain Security

Daily Briefing Newsletter

Cyprus Airways Data Breach: Hackers Claim Access to Real-Time Systems and Passenger Records

SimpleHelp Exploit Fallout: Ransomware Hits Utility Billing Platforms

TeamFiltration and Token Theft: The Cyber Campaign Microsoft Never Saw Coming

64 Million T-Mobile Customer Records Allegedly Exposed in New Data Leak

Debt Collection Breach at CCC Exposes Data of Over 9 Million Americans

Yes24 Ransomware Attack Disrupts South Korea’s Entertainment Industry, Exposes Millions to Risk

Cyberattack Disrupts WestJet Internal Systems, Airline Investigating with Authorities