A critical out-of-bounds read vulnerability in Ollama — the widely adopted platform for running large language models locally — has left more than 300,000 internet-facing AI servers exposed to remote memory disclosure. Researchers at Cyera disclosed the flaw on May 10, 2026, assigning it CVE-2026-7482 with a CVSS score of 9.1 and the name “Bleeding Llama.” A fix is available in Ollama version 0.17.1.
CVE-2026-7482: How the “Bleeding Llama” Flaw Exploits Ollama’s GGUF Model Parser
Ollama is among the most widely deployed tools for running AI inference on local hardware and private cloud environments. Its broad adoption among enterprises seeking data-residency-compliant AI deployments makes CVE-2026-7482 an urgent concern for security teams managing self-hosted AI infrastructure.
The vulnerability resides in Ollama’s GGUF model file parser, within the source files fs/ggml/gguf.go and server/quantization.go. The GGUF format encodes tensor metadata that describes model weight layout in memory. Ollama reads these values to determine allocation boundaries when loading a model. A specially crafted GGUF file can specify tensor offset and size values that extend beyond the file’s actual length, causing Ollama to read memory past the boundary of its allocated heap buffer — disclosing process memory to an unauthenticated remote attacker.
How the Three-Step Exploit Chain Reaches Heap Memory Without Authentication
Cyera researcher Dor Attias documented a complete attack sequence requiring no credentials. An attacker first constructs a GGUF file with manipulated tensor shape metadata to force the out-of-bounds read. The file is then uploaded to the target server via Ollama’s /api/create endpoint, which processes incoming model files as part of standard model-creation workflow. Finally, the attacker instructs the server to push model artifacts to an attacker-controlled registry through the /api/push endpoint, recovering the leaked heap contents from the response.
No credentials or prior system access are required at any stage. Attias confirmed the technique is practically feasible, and while no in-the-wild exploitation was reported at the time of disclosure, the public documentation of the exploit chain substantially raises that probability.
What Leaked Heap Memory Contains in Enterprise AI Deployments
The memory accessible through a successful Bleeding Llama exploit reflects the live state of the Ollama process at the time of attack. In typical enterprise deployments, this runtime memory includes API keys for upstream AI services, environment variable contents, and records of active LLM inference sessions — including user prompts and model responses.
Organizations that route Ollama requests through shared credential chains to cloud-hosted AI APIs could find those credentials exfiltrated. AI conversation logs that organizations treat as confidential, including those involving sensitive business queries or personally identifiable information submitted to in-house models, may also be present in the leaked memory region.
300,000 Publicly Reachable Servers and the Default-Binding Misconfiguration Risk
Internet scans conducted around the time of disclosure identified more than 300,000 Ollama servers accessible from the public internet and running versions prior to 0.17.1. Ollama’s default configuration binds to all available network interfaces and does not enforce authentication, a design appropriate for local development but problematic when instances are deployed on internet-adjacent infrastructure without additional network controls.
The scale of unintentional exposure reflects a pattern familiar from other developer tools adopted rapidly by organizations: the pace of deployment outstrips the maturation of secure configuration practices, leaving default-open servers reachable without deliberate intent.
Why Enterprise Self-Hosted AI Infrastructure Amplifies the Blast Radius
The exposed population extends well beyond individual developers running local AI experiments. Enterprises that have deployed Ollama to satisfy data residency requirements, reduce inference costs, or maintain privacy over sensitive internal data are represented in that 300,000-server count. Compromise of these systems could expose proprietary model configurations, internal system prompts, and the integration credentials that connect Ollama to downstream business systems.
CVE-2026-7482 is the first critical remote memory disclosure vulnerability to specifically target the AI local-deployment ecosystem, reflecting the expanding attack surface created as large language model infrastructure moves into enterprise production environments.
Applying the Ollama 0.17.1 Patch and Auditing Network Exposure
Ollama version 0.17.1 addresses CVE-2026-7482 and is available through the project’s standard release channels. All organizations running Ollama should upgrade immediately regardless of whether they believe their instances are internet-facing.
Beyond patching, administrators should audit the network exposure of every Ollama deployment. Instances configured to bind on 0.0.0.0 in environments where public internet access is not architecturally required should be restricted to loopback or internal interface addresses through firewall rules or system-level binding configuration. The /api/create and /api/push endpoints implicated in the Bleeding Llama exploit chain should not be accessible from untrusted networks in any operational configuration.
