← Back to blog
11 min readAISRESecurity

Five OWASP hurdles for your AI SRE

Twenty OWASP risks, but for an AI SRE agent they collapse into five hard hurdles between a working demo and production you can trust.

June 17, 2026

Five OWASP hurdles for your AI SRE

The OWASP LLM Top 10 and the Agentic Top 10 each tell you what can go wrong. Neither tells you what it takes to put an SRE agent into production without tripping over most of it at once.

An SRE agent lives where the two lists overlap: untrusted inputs, production write access, and nobody awake to catch the mistake. These are the five hurdles that overlap creates, and how we clear them.

Each list is, on its own, a long inventory of separate problems: twenty entries across the two, each asking for its own control. The agentic half is the fresh one, ten risks OWASP's GenAI Security Project published in December 2025. An SRE agent is where that separateness falls apart.

Point an SRE agent at production and the entries stop being independent. They keep landing on the same handful of design decisions, because the agent is close to the worst case either list describes: it reads attacker-influenced data, it holds tools that change production, and it runs when nobody is watching. Risks that stay theoretical for a chatbot are its default operating condition. What follows is the five hurdles that show up when you actually ship one, each a place where entries from both lists collapse into a single choice. We walked the lists themselves entry by entry elsewhere, for the LLM Top 10 and the agentic risks. This is about what they add up to.

Hurdle 1: Your inputs are untrusted and your outputs change production

Every signal an SRE agent reads is reachable by someone else. Log lines, alert payloads, ticket descriptions, commit messages, error strings from a third-party API: all of it is written by systems and people the agent doesn't control, and any of it can carry an instruction. The same agent then holds tools that drain nodes, scale deployments, restart services, and run queries against live data. That gap, between where the input comes from and what the output can do, is the bind that defines the whole category.

Prompt injection and goal manipulation aren't exotic here. OWASP ranks prompt injection first on the LLM list, LLM01, and goal hijacking opens the Agentic list at ASI01. For an SRE agent, those rankings describe the everyday surface rather than a corner case. An attacker doesn't need access to your model, only a log line. If the agent reasons over that text in free form and then decides a state change from its own conclusion, the injected instruction and the real alert reach the decision down the same path. Improper output handling closes the loop: the model's text becomes a kubectl argument, a SQL string, or a shell command, and the system runs it because it came from the agent.

The reflex that helps most is to treat all operational data as untrusted, which sounds obvious until you notice it reverses the usual assumption that internal logs and alerts are safe because they came from inside the perimeter. For an SRE agent, inside the perimeter is exactly where the untrusted text lives. So the agent should never decide a state change by reasoning freely over that text. Tag it explicitly, wrapping ingested content in a typed block instead of splicing it into the prompt. Have the model emit a structured action validated against a schema rather than a raw command, look up every target in an allowlist, and split read from write at the tool interface.

We handle this carefully at Hyground. By default the agent is read-only, so an injected log line has nothing it can act on, and untrusted input is wrapped in tagged blocks before the model even sees it. Write access has to be a deliberate opt-in and should not be done without additional safeguards and human supervision.

Hurdle 2: The agent's identity and tools are the blast radius

Whatever standing permissions the agent holds are your worst case, not what it does on a good day but what it can do on the run where the reasoning goes wrong. Excessive agency, tool misuse, and unexpected code execution are one problem seen from three angles: the agent was handed more reach than the task in front of it needed.

Most setups make this worse without meaning to. One broad tool exposes every kubectl verb and collapses "list pods" and "delete namespace" into the same risk tier. A service account gets cluster-admin because scoping it was tedious. A developer's own credentials get passed straight through, so the agent inherits a human's entire access. Once any of that is in place, an injection from the first hurdle stops being a bad suggestion and becomes a bad suggestion with production-wide authority behind it.

The leak is quieter but related. An over-scoped agent reads secrets it never needed and then drops them into a postmortem, a Slack reply, or the model provider's logs, so it's worth assuming the system prompt itself will leak eventually and keeping credentials and customer-specific logic out of it. None of this is rare. In its 2025 breach study, IBM found that 97% of organizations that suffered an AI-related breach lacked proper AI access controls, and 63% had no AI governance policy in place or were still building one. Scoping is the control teams skip most, and it's the one that sets the blast radius.

The fix is unglamorous. Keep read paths broad and write paths narrow, and gate the writes behind an approval the agent can't generate on its own, so no amount of clever reasoning lets it self-authorize. Scope identity to the task rather than to the person who set it up, don't pass developer credentials to the model, and redact at ingestion so the secrets never arrive in the first place.

It's a big contributor to the decision to run Hyground inside the customer's own cluster, under its own read-only service account rather than a borrowed human login. And because it talks to any LiteLLM-compatible model, you can point it at a self-hosted or sovereign-cloud one and keep your operational data inside your perimeter.

Hurdle 3: What the agent knows can be poisoned, and a confident wrong answer is worse than none

This is the hurdle we'd lose sleep over. History is what makes an SRE agent useful at all. A model that reads logs fast is only an observer, and real root cause analysis lives in the history of a system, the prior incidents, the ownership quirks, the change that looked unrelated at the time. The same history is the surface an attacker can poison.

Poisoned training data, a planted runbook chunk, a tampered embedding, a memory written during one incident and trusted in the next: none of these touch the agent's live inputs. They corrupt what it knows, and they're patient. A single poisoned precedent can sit unused until the night it gets retrieved as the closest match to a real incident, and by then whoever planted it is long gone. It's harder to catch than injection because nothing looks wrong in the moment. The agent is reasoning correctly over knowledge that was wrong before it started.

The version that needs no attacker at all may be the worst of the five. The model produces a confident, fluent, wrong root cause, and a tired engineer at 3 A.M. takes it because it reads like it is correct. The classic shape is a model that sees rising request volume and recommends more servers when the real problem is a broken cache: the symptom is real, the story is clean, and the conclusion is wrong. A blank dashboard keeps you looking. An answer in good prose makes you stop.

Grounding is the only real defense. Tie the reasoning to verifiable history, real prior incidents, deployment timelines, ownership records, rather than letting the model narrate from memory. Tag retrieved knowledge with its access scope at ingestion and filter before the model sees it, not after. Refresh what the agent knows on the cadence your runbooks actually change, not once at setup. And make the agent show its work, surfacing the competing signals and naming what it would check next, with remediation kept behind a human so a confident wrong answer costs a second look instead of an outage.

We built Hyground around this one most deliberately. Every investigation can be checked against the team's own runbooks, past incidents, and the notes earlier investigations left behind, not the model's training data, and any fix it proposes waits on a human before it runs.

Hurdle 4: It will go wrong at scale, so containment can't live in the agent

When an SRE agent fails, it tends to fail in volume. On its own that looks like a runaway loop, a retry that hammers an API or an investigation that recurses while the inference bill climbs and the loop still looks productive. Across a fleet it compounds, because one agent's bad output becomes another agent's trusted input and a small mistake travels faster than anyone can read it.

The uncomfortable part is that the agent isn't the right system to decide when to stop. An agent reasoning about whether to break its own loop is using the same judgment that put it there. Containment has to sit outside the thing being contained. That means budgets and hard ceilings enforced by the orchestrator whether or not the agent agrees, a blast radius bounded per task, no path for one agent to hand another more authority than it started with, and a kill switch that still works on the day the agent itself is the problem.

In Hyground the stops that matter sit outside the agent's reasoning: a hard cap on loop iterations, operator-set timeouts, and an abort that ends a run from the outside. The agent doesn't get a vote on when to quit.

Hurdle 5: Nothing here stays solved

The last hurdle barely fits as a single entry on either list. None of the first four is finished the day you build it, and the clearest proof is the supply chain underneath the agent. Every connector and every model is a third-party dependency you've handed credentials and production reach, and an SRE agent carries more of them than most systems because reaching every relevant data source is the whole point.

The connector world already ran this experiment for us. MCP went from integration darling to a steady stream of CVEs and tool-poisoning reports in barely a year, while teams were busy wiring local agents straight into production clusters. The 2025 disclosures weren't minor: a critical remote-code-execution flaw in one widely used MCP tool, and an OS command injection in a connector package with more than 437,000 downloads. A connector you vetted last quarter for being convenient can be the liability this quarter, and nothing pages you when that flips.

That churn is the hurdle. The connectors change, the models change, and a control that was sufficient in March is partial by June, so clearing the first four isn't a project with an end date. It's also the part teams underestimate most when they decide to build in-house. Gartner predicted in 2025 that more than 40% of agentic AI projects will be canceled by the end of 2027, often citing unclear value and inadequate risk controls, and controls that quietly rot are a large part of how a project lands in that 40%. The defenses worth keeping are the boring ones: pin and sign connectors, review them before they get credentials, don't let the agent install its own, track CVEs across every tool in its reach the way you would any production dependency, and prefer a vetted set of integrations over open-ended tool access even though open access always demos better.

That churn is ours to absorb. Each connector is a version-pinned, separately built image, and a nightly job tracks CVEs across all of them and pulls in upstream fixes as they land, so the maintenance treadmill is ours to run, not the customer's.

The real question

The two lists aren't a reason to keep agents out of production. Each risk on them is the price of a capability you actually wanted: the agent reads your operational data because that's how it helps, and it can act because an observer that can't act is just a faster dashboard. The lists are the bill that comes with those capabilities.

Built well, the capability pays for itself. In the same 2025 IBM study, organizations that used AI and automation heavily in their security operations contained breaches 80 days faster and spent USD 1.9 million less per breach than those that didn't. Clearing these hurdles is how you keep that upside without buying a new class of incident to get it.

We won't pretend we've finished clearing them either. We keep finding edges, and parts of this list are things we got wrong before we got them right. But that's the point: it's work that wants a team whose whole job is to keep at it, not a side quest for a team that already has a product to ship.

Clearing all five once is achievable with enough effort. The real question is whether you can keep clearing them while the ground moves, every month, in production, at 3 A.M., for as long as the agent is plugged into your systems. That's the part worth being deliberate about.

None of this means you can't build it yourself. Plenty of teams will, and clearing these five hurdles once is well within reach. Keeping them cleared is the harder job. That's the job we've taken on at Hyground, so your team can spend its time on the systems it's paid to run rather than on the upkeep of the agent watching them. If that trade sounds right, talk to the team.

Tim Chen

Author

Tim Chen

Forward Deployed Engineer

AI Engineer driven by deep curiosity for technology. Versatile full-stack background with a focus on security, thrives on solving complex problems and building impactful solutions. Outside of work, enjoys cooking or Arduino projects.

Keep exploring

Article

What OWASP LLM Top 10 Means for AI SRE Agents

OWASP's LLM Top 10 turns from an abstract risk list into a concrete architecture spec the moment you put an AI agent inside your operations loop.

Article

What the OWASP Top 10 for Agentic Applications Means for AI SRE Agents

OWASP's Top 10 for Agentic Applications is the threat model for AI agents in production systems. Here is what each of the ten risks means for an AI SRE agent.

Article

What is an AI SRE?

An AI SRE is an autonomous, LLM-powered agent that triages alerts, investigates incidents, and finds root causes across production systems without step-by-step human direction. The role is emerging just as AI-generated code pushes operational toil to its first rise in five years. What AI SREs do, where they run, and how to evaluate one.