OrcaRouter Releases AI Threat Report 2026 and Makes Its Security Controls Free Amid Rise in Prompt-Injection Attacks

0
42

OrcaRouter has published The AI Threat Report 2026 and made its agent Firewall and input/output Guardrails free for every user — same API key, one switch, no code changes. The report argues that AI systems have become the attack surface, with prompt injection now the #1 risk to LLM applications and one that cannot be patched. OrcaRouter’s answer is architectural: gateway-level controls that bind to credentials, so any team can enforce them without rewriting their agents.

Prompt injection ranks as the top risk
to LLM applications and, the company says, cannot be fully patched. OrcaRouter
Security Research has made its agent Firewall and input/output Guardrails
available at no cost to all users, attached to an existing API key.

SINGAPORE — June 18, 2026 — OrcaRouter, the OpenAI-compatible LLM
gateway, today published The AI
Threat Report 2026
and made two of its security controls available at
no cost to all users: the agent Firewall and input/output Guardrails. According
to the company, the controls can be attached to an API key already in use,
without a separate integration or purchase.

The AI Threat Report 2026 — 14
key risks across four threat categories.

The report states that AI systems have themselves become an
attack surface, and that most organizations cannot see the attacks directed
against them.
Telemetry from production LLM applications shows the average
successful attack completing in 42 seconds, with 90% of them leaking
sensitive data
(Pillar Security). Prompt-injection attacks rose 340%
year over year
(OWASP, Q1 2026). And 13% of organizations have
already been breached through an AI model or application — 97% of those
lacked basic AI access controls
(IBM, 2025).

By OrcaRouter Security Research · June
2026

In June 2025, attackers exfiltrated corporate data from Microsoft
365 Copilot. The victim did nothing wrong — no link clicked, no attachment
opened, no prompt approved. They received an email. Their AI assistant later
read it, and obeyed the instructions hidden inside. Disclosed by Aim Security
as EchoLeak (CVE-2025-32711), the attack gathered sensitive context from
mail, files, and chat history and smuggled it out through an auto-loading image
URL. Zero clicks.

According to the report, EchoLeak was not an isolated case but an
early example of a broader pattern.

A year of escalating, increasingly automated incidents

The report’s 2026 incident record spans cases that challenged
longstanding assumptions in enterprise security:

•     Chat & Ask AI left roughly 300
million private chat messages from more than 25 million users exposed through a
Firebase misconfiguration (404 Media; Malwarebytes, Jan 2026).

•     Sears Home Services exposed 3.7 million
AI chat transcripts and call recordings — names, addresses, emails — spanning
2024–2026 (ExpressVPN; Cybernews, Mar 2026).

•     An attacker chained a single CVE (CVE-2026-39987 in the
marimo notebook tool) into a live LLM agent that extracted cloud credentials,
pulled an SSH key from AWS Secrets Manager, and exfiltrated an entire internal
PostgreSQL database in under two minutes (Sysdig; The Hacker News, May 2026).

•     Microsoft and Salesforce both shipped
patches for AI-agent data-leak flaws. In CVE-2026-21520, a poisoned SharePoint
field steered Copilot into emailing customer data to an attacker — and the data
left even after a safety mechanism flagged the attack (Dark Reading).

•     Denial-of-wallet — a hijacked or runaway
agent that simply spends — has been observed burning $46,000 a day (Sysdig,
“LLMjacking”). No data is stolen. There is only a bill.

Three years of public
incidents, research, and regulation — 2023 to 2026.

Why traditional security tools miss these attacks

Traditional security assumes a boundary: trusted inside, untrusted
outside, controls at the seam. Language models dissolve that boundary, because a
model’s input is also its programming.
Every email, document, web page, and
tool result an agent reads can carry instructions it will follow. There is no
reliable, general mechanism by which today’s models separate content to process
from commands to obey.

That is why prompt injection holds the #1 position in the OWASP
Top 10 for LLM Applications
— and why, the company argues, it will not be
“patched” the way a buffer overflow is. It is described as a structural
property of the medium: a web application firewall inspects the request and
sees a perfectly valid API call, because the attack is in the words.
Per-request checks pass every step of a chained attack, because the damage
lives in the sequence — volume, repetition, and spend against time — not in any
one call.

The report concludes that AI security is not a model-training
problem. It is an architecture problem
— and it is solvable with the same
discipline enterprises already apply to every other production system.

The 14 key risks across four
threat categories: content plane, action plane, economic, and trust &
supply chain.

A gateway-level approach: two planes, six layers

Every attack above succeeds against unscoped authority and fails
against scoped, policed, audited authority. Containing them requires
controlling two distinct planes:

•     The content plane — what the model reads
and writes. This is the job of Guardrails.

•     The action plane — what the agent does:
the tools it calls, the networks it reaches, the money it spends. This is the
job of the Firewall.

The report notes that the most damaging incidents cross both planes:
an injection arrives as content, then executes as an action. OrcaRouter’s
design places six independent, auditable layers between a request and its
execution:

•     Scoped identity — every agent calls
through its own key carrying allowed models, an IP allow-list, a hard spend
cap, and an expiry. An out-of-scope request dies before any content is read.

•     Input guardrails — injection and
jailbreak rules, PII detection and masking, secret blocking, and a semantic
LLM-judge that catches what regex cannot.

•     The action firewall — every tool call,
MCP dispatch, and network egress is judged against ordered, default-deny policy
with six verdicts: allow, audit, deny, sanitize, pending-approval, and
cap-cost. A hijacked agent cannot reach a tool, a host, or a spend limit that
was not explicitly listed.

•     Output guardrails — the reply is
screened on the way out for unsafe output, PII, and secrets, with grounding
checks. This is the layer that catches EchoLeak’s exfiltration URL before it
leaves.

•     Anomaly detection — behavioral baselines
flag what static rules can’t predict: the same call hammered in a tight window,
spend spiking against a learned baseline, a tool-to-tool transition the
workspace has never made.

•     Signed audit — every match, verdict,
approval, and policy change lands in a tamper-evident trail, correlated by
agent run and session, exportable as evidence.

The decisive property is placement. These controls live at the
gateway, in the request path, so they bind to credentials rather than
application code
— enforceable across every team and framework, with no
agent rewrites.

Observed prevalence versus
potential business impact, mapped by threat plane.

Evaluation against open red-team benchmarks

The company says Guardrails and Firewall ship with an evaluation
harness that scores them against more than 80 open-source red-team corpora,
each cited and licensed:

•     HarmBench (MIT; ICML 2024), JailbreakBench
(NeurIPS 2024), and AdvBench (Zou et al., 2023) for harmful-behavior and
jailbreak robustness;

•     NVIDIA’s garak (Apache-2.0), the open
LLM vulnerability scanner, for injection and encoding attacks;

•     AgentDojo (NeurIPS 2024) — the agent
prompt-injection benchmark the US and UK AI Safety Institutes used in joint
red-teaming — to grade the action-plane firewall specifically;

•     TruthfulQA and others for grounding and
hallucination.

OrcaRouter integrates open tooling directly: OSV for
dependency CVEs and Semgrep for code that transits a prompt.

Aligning with incoming regulation

On August 2, 2026, the EU AI Act becomes fully applicable,
and “show me” replaces “tell me” as the regulatory baseline. The same
evidentiary instinct is spreading through SOC 2 scopes, cyber-insurance
questionnaires, and procurement reviews. OrcaRouter ships 36 compliance
framework packs
— including OWASP LLM Top 10, NIST AI RMF, ISO/IEC 42001,
EU AI Act, SOC 2, HIPAA, PCI DSS, and GDPR — that apply controls within a
workspace and generate signed evidence. According to the company, one control
layer can produce attestation for all of them at once.

What is being released

OrcaRouter Firewall + Guardrails are now free for every user. The controls attach to an API key already in use and do not require
a separate integration.

The company said it made the controls free deliberately, citing the
report’s finding that restricting AI use without an approved alternative tends
to increase unsanctioned, or “shadow,” AI rather than reduce it — and that
shadow AI already drives one in five breaches at a $670,000 premium
(IBM, 2025). The company argues that the response is as much economic as
technical: make the governed path the easiest path. A control that
carries an extra cost, requires manual integration, and must be justified to a
budget committee is, it says, one that many teams will skip.

Guardrails and a Firewall policy attach to an existing key, and the
company recommends a staged rollout: observe (run in audit mode and let
real traffic write the baseline), shadow (run the real policy in
would-block mode until false positives approach zero), then enforce
(flip verdicts live, with human approval reserved for the genuinely
irreversible). Most teams convert in weeks — and keep the controls on.

Outlook

The report frames the 2026 threat landscape not as a reason to slow
AI adoption but as a guide to managing it. Its central argument is that the
documented attacks succeed against unscoped authority and fail against scoped,
policed, and audited authority — a property the company says can be implemented
at the gateway level.

Availability: The
Firewall and Guardrails are available now to all OrcaRouter users. The AI
Threat Report 2026 is published on the OrcaRouter documentation site.

 

About OrcaRouter

OrcaRouter is an OpenAI-compatible LLM gateway from Continuum AI
Pte. Ltd. (Singapore), routing across 200+ models with around 40% cost
reduction, sub-millisecond routing overhead, and zero token markup. A
self-hosted edition, OrcaRouter-Lite, is available under the MIT license.

Media contact: Yi Shi ·
yi@continuum01.ai

LEAVE A REPLY

Please enter your comment!
Please enter your name here