A null byte in a hostname was all it took to break out of Claude Code’s sandbox — and combined with a prompt injection in a GitHub Actions comment, that meant an attacker could steal developer secrets from repositories the AI agent was working on.
Null-Byte Hostname Injection Let Attackers Bypass Claude Code’s Network Allowlist
Security researcher Aonan Guan discovered a SOCKS5 null-byte hostname injection vulnerability in Claude Code’s network allowlist sandbox that allowed the creation of arbitrary outbound connections to attacker-controlled servers, circumventing all network sandbox restrictions. The flaw existed from October 20, 2025 through March 31, 2026, when Anthropic patched it in Claude Code version 2.1.88.
The patch was shipped on March 31, made through a public commit on March 27, but no public security announcement was issued at the time. Guan submitted his HackerOne report on April 3 — after the fix had already been deployed — and no CVE was formally assigned. The vulnerability became public knowledge only through Guan’s disclosure and subsequent reporting on May 20, 2026.
How the Null-Byte Bypass Defeated the Allowlist Filter
The attack mechanism exploited the difference between what the allowlist filter checked and what the operating system actually connected to. By constructing hostnames in the format attacker-host.comx00.google.com, an attacker caused the allowlist filter to approve the connection — the filter parsed the full string and matched the .google.com suffix as an approved domain. However, the operating system treats the null byte as a string terminator, truncated the hostname at x00, and connected to attacker-host.com instead.
This is a classic null-byte injection pattern: the security check and the runtime system parse the same string differently, and the attacker crafts input that satisfies the check while producing a different outcome at execution time.
Compound Attack Chain: Sandbox Escape Combined with Prompt Injection
The vulnerability was, in Guan’s assessment, “particularly useful in combination with a prompt injection attack.” The compound attack chain worked as follows: an attacker posts a malicious prompt inside a GitHub repository comment — for example, through a pull request review or an issue comment. When a Claude Code agent running in GitHub Actions processes that repository and encounters the injected prompt, it can be directed to establish an outbound connection using the null-byte bypass. That connection reaches an attacker-controlled server, carrying environment variables, API tokens, cloud credentials, and infrastructure secrets that the GitHub Actions runner has access to.
This attack class — prompt injection targeting AI agents in CI/CD pipelines — is distinct from traditional software vulnerabilities. It exploits the instruction-following behavior of the AI model itself as a component in the attack chain. The sandbox escape was the technical enabler; the prompt injection was the delivery mechanism.
Silent Patching and AI Vulnerability Disclosure Norms
Anthropic’s handling of this vulnerability raises questions about disclosure practices for AI company security flaws. The patch was deployed quietly, no CVE was assigned, and no security bulletin was published at the time of the fix. The affected period — October 2025 to March 2026 — spanned five months during which organizations deploying Claude Code agents in GitHub Actions workflows were exposed to the compound attack without knowledge of the risk.
Traditional software vendors typically issue CVEs, security advisories, and release notes that flag security fixes explicitly. As AI coding agents are deployed in increasingly sensitive CI/CD environments with access to cloud credentials and infrastructure secrets, the question of whether AI companies should adopt equivalent disclosure transparency is becoming more consequential.
AI Agent Deployments in CI/CD: An Expanding Attack Surface
The Claude Code sandbox escape illustrates a broader risk that accompanies the deployment of AI coding agents in automated workflows. These agents operate with access to repository secrets, cloud credentials, and service tokens — the same high-value targets that sophisticated attackers pursue through supply chain attacks and credential theft campaigns. A sandbox escape that permits arbitrary outbound connections converts a trusted AI agent into a potential exfiltration vector.
Organizations running AI coding agents in GitHub Actions or other CI/CD systems should review what secrets those agents have access to, apply the principle of least privilege to agent permissions, and ensure that network egress from agent environments is controlled and monitored.
