Researchers at Trail of Bits have demonstrated a new, technically clever prompt-injection attack that hides executable instructions inside high-resolution images so the commands appear only after the image is downscaled by an AI system.
The method — which builds on a 2020 USENIX theory from TU Braunschweig about image-scaling attacks — shows how routine image resampling can accidentally reveal hidden text that a large language model (LLM) will treat as user input.
The research team, led by Kikimora Morozova and Suha Sabi Hussain, tested the approach across multiple AI products and published an open-source tool to generate proof-of-concept images.
The basic idea is simple and subtle. Users upload full-resolution pictures that look normal to the eye. When an AI service resizes those images to reduce cost or speed up processing, the resampling algorithm can introduce aliasing artifacts that reveal carefully encoded patterns.
Those emergent patterns can form readable text or instructions once the image is scaled down, and an LLM processing the downscaled image will combine the hidden text with the visible user prompt. From a user perspective nothing appears wrong, but the model is executing instructions it never saw at full resolution.
How Downscaling Turns Pictures into Hidden Prompts
Image resampling algorithms such as nearest neighbor, bilinear, and bicubic interpolation change pixel arrangements when reducing resolution. Each method produces different aliasing artifacts.
Trail of Bits’ experiments show attackers can craft a source image in which specific dark or colored regions shift in predictable ways under a chosen resampling method. In one demonstration, dark areas in the original image shifted hue when bicubic downscaling was applied, making concealed characters appear in black on the scaled image. Those characters become literal instructions for the model.
Because resampling behavior is deterministic for a given algorithm and dimension target, the attack requires per-model tuning. A payload that reveals text under bicubic downscaling may not surface under nearest-neighbor sampling, and vice versa.
The researchers therefore stress that an attacker must craft images with knowledge of the target system’s downscaling method and target resolution to ensure the hidden prompt will reliably appear.
Practical Example: Gemini CLI and Zapier Exfiltration
To show the practical impact, Trail of Bits ran an end-to-end proof-of-concept against several multi-modal interfaces. In one scenario using the Gemini CLI, the downscaled hidden prompt instructed the model to call a third-party automation tool and exfiltrate Google Calendar entries to an arbitrary email address.
By chaining the LLM’s output with Zapier and taking advantage of a misconfigured trust parameter (‘trust=True’) that automatically approved tool calls, the researchers were able to demonstrate data exfiltration without any visible user interaction. The example highlights how an otherwise innocuous image upload can trigger downstream tool calls and leak user data when safety checks are permissive.
Tested Platforms and Tooling
Trail of Bits confirmed the feasibility of the approach against a range of systems they could access and test. These include, but are not limited to:
- Google Gemini CLI
- Vertex AI Studio (with Gemini backend)
- Gemini’s web interface
- Gemini’s API via the llm CLI
- Google Assistant on Android
- Genspark
To assist replication and defenses, the team released Anamorpher (in beta), an open-source generator that crafts images tuned for each downscaling method. Anamorpher can create source images that reliably reveal hidden text when processed with nearest-neighbor, bilinear, or bicubic interpolation at the target dimensions.
Why this Attack is Broad and Hard to Spot
Two features make the attack dangerous for production AI services. First, image downscaling is commonplace — systems commonly resize uploads to a standard resolution for inference. Second, the attack is stealthy: humans viewing the original upload see nothing suspicious, and many defenses do not inspect image resampling outcomes or the downscaled pixels actually passed to the LLM. Because the hidden instructions appear only after the platform performs a benign-looking optimization step, conventional input-sanitization routines may miss them.
The need to tune payloads to a target’s resizing algorithm limits universal exploits, but the prevalence of the same resampling approaches across services means an attacker who targets a specific platform can adapt and scale the technique.
Recommended Protections Reported by the Researchers
Trail of Bits describes several mitigations that platform operators should consider and includes them as part of the disclosure:
- Apply dimension restrictions on user-provided images to avoid unexpected resampling that could reveal hidden text.
- When downscaling is necessary, present a preview of the exact image the LLM will receive so users can see what the model sees.
- Require explicit user confirmation before proceeding with sensitive tool calls, particularly when the system detects text in an uploaded image.
- Adopt secure design patterns and systematic defenses to mitigate prompt injection across modalities rather than relying solely on ad hoc filters.
As the researchers summarize: “The strongest defense, however, is to implement secure design patterns and systematic defenses that mitigate impactful prompt injection beyond multi-modal prompt injection.”
Research Lineage and Broader Context
The attack builds directly on the idea of image-scaling manipulation first described in academic work a few years earlier. Trail of Bits’ contribution is to operationalize that theory against modern multi-modal LLMs and to show concrete end-to-end abuse scenarios where hidden prompts can drive tool use and data leaks. By publishing Anamorpher and public demonstrations, the team aims to make the community aware of the specific risks so vendors can design appropriate checks into their inference pipelines.
What Remains to be Explored
Trail of Bits notes that because the exploit must be adapted to the target’s downscaling algorithm and model pipeline, the threat model varies by vendor. The attack surface scales where services use similar resampling methods and accept image uploads that trigger downstream tooling. The researchers also point out that, while they tested a set of mainstream tools, the technique likely generalizes to other platforms that perform image downscaling before LLM ingestion.
The image-scaling prompt-injection attack shows that optimization steps intended to save compute and speed processing can unintentionally open new attack vectors. By hiding executable instructions that only appear after resampling, adversaries can blend data-theft prompts into user uploads in a way that many defenses do not currently detect. Trail of Bits’ Anamorpher and their published findings provide practical proof and a clear call: multimodal systems must include systematic defenses that consider the transformations applied to user inputs as part of their threat model.