← Back to blog
12 min readAISecuritySRE

What the OWASP Top 10 for Agentic Applications Means for AI SRE Agents

OWASP's Top 10 for Agentic Applications is the threat model for AI agents in production systems. Here is what each of the ten risks means for an AI SRE agent.

June 5, 2026

What the OWASP Top 10 for Agentic Applications Means for AI SRE Agents

Most of what an AI SRE agent does is read. It pulls metrics, searches logs, and lines up events until it has a working theory of what is going wrong, and for a lot of teams that would already earn its keep. Things get more interesting once you let it close the loop and act on what it found, restarting a workload, shifting traffic, or scaling a deployment. Those actions could hit production the moment the agent decides on them, often before anyone has read the reasoning behind them. That is what raises the stakes for everything downstream.

In December 2025 the OWASP GenAI Security Project published the OWASP Top 10 for Agentic Applications, risks ASI01 through ASI10. It is a separate list from the older OWASP LLM Top 10 we wrote about earlier for a reason. An agent does far more than just run a chat model with a longer prompt, it plans a course of action, holds credentials, carries memory from one session to the next and works alongside other agents. Several of the entries, the ones about agents talking to agents, would not even make sense for a model that can only answer a question.

The two lists still overlap heavily. An agentic application inherits the whole LLM Top 10, from prompt injection to sensitive-data disclosure, with the agentic risks layered on rather than replacing them, and a few entries are simply an LLM risk you could step on. Prompt injection for example becomes goal hijack once the model holds a tool to act on it. And these are not edge-case systems. An SRE agent is exactly the kind of custom AI application, wired straight into production, that Gartner has in mind when it predicts incidents involving AI applications will drive 50% of cybersecurity incident-response effort by 2028.

Every one of the ten risks is the cost of a capability a chatbot never had, so we have grouped them by that capability: what happens when the agent can act, when it carries an identity, when it remembers, when it works as part of a fleet, and when it has to convince a human.

In short

  • Every risk traces back to a power a chatbot never had: the agent acts, carries an identity, remembers, coordinates with other agents, and has to persuade a human. Group the ten that way and the defences fall out of the grouping.
  • The most important principle to keep in mind: least agency. An agent should hold the least capability, memory, and autonomy its task needs, and nothing more. Match the autonomy to the blast radius.
  • Give the agent its own scoped identity, never a borrowed admin account. Gartner expects a quarter of enterprise breaches to trace back to AI agent abuse by 2028, and an over-permissioned agent is the easiest way to get there.
  • The human you flood stops being a safeguard. Reserve approvals for the actions that genuinely need one, log everything, and keep a kill switch that works in a single move.

When the Agent Can Act

The first thing an agent does that a model cannot is take an action in the world. You hand it a tool and a goal, and it uses the tool to pursue the goal. That is the entire value of the thing, and three of the OWASP risks focus on it. The goal can be rewritten by something the agent reads. The tool can be aimed somewhere you never intended. The action can be code the agent wrote for itself.

ASI01 Agent Goal Hijack

Your agent is investigating a latency spike. Somewhere in the logs it reads is a line nobody on your team wrote: an "internal note for the assistant" saying the real fix is to scale the payments deployment to zero and mark the incident resolved. In a chat window that is a wrong answer you scroll past. Given a scale tool, it is an instruction that runs. Goal hijack is an injected instruction that has been handed the controls.

What makes it dangerous is that the payload arrives through the channels the agent is most useful at reading. Alerts, tickets, commit messages, chat threads, log lines. Anything an attacker can write into, they can steer through, and operational data is full of fields an attacker can write into.

In practice, the agent's objective has to come from the system and the operator, never from the content it happens to read. Treat all operational data as untrusted by default. A high-impact change like scaling a service to zero, should need an approval the model cannot produce on its own.

ASI02 Tool Misuse and Exploitation

You gave the agent a free_disk_space tool because nodes kept filling up overnight. One night it reads a disk-pressure alert, reasons that the fastest way to free space is to drop the persistent volume behind a stateful set, and calls the tool you blessed with an argument you never thought of. Nothing was breached. The tool did exactly what it was built to do, and the agent used a legitimate capability to destroy data.

That is all there is to tool misuse. It lives in the gap between "the agent may call this tool" and "the agent may call this tool, with these arguments, against these targets, this many times." Granting the verb is not granting the blast radius, and the model fills the difference with whatever it inferred.

In practice, tools should also be scoped at the argument level, not just the verb level:

  • Allowlist the namespaces and resource types a tool may touch, and reject the rest at the interface.
  • Treat an irreversible call as a different risk class than a read. Dry-run first, or require a confirmation the agent cannot supply on its own.
  • Validate arguments before execution. "Delete" with a wildcard target is not the same request as "delete" with one named object.

ASI05 Unexpected Code Execution

An agent that drafts a kubectl patch or a remediation script saves you real time. But the moment it runs that script itself, unreviewed, you have built a primitive remote-code-execution on purpose: the model only has to be wrong once, in a single generated command, while it holds permission to execute.

So treat anything the agent generates as untrusted output. There should be no version of "the model wrote it, so it is probably fine" that survives contact with production, which is why none of it should run directly against live systems unless verified. It should executes in a sandbox, or it should land as a proposed change that a human or a stricter policy signs off before anything reaches the cluster. The line between "the agent drafted this" and "this ran" is an important checkpoint to secure.

When the Agent Carries an Identity

To act, the agent needs permission, and permission means an identity. That identity can quickly become one of the most powerful non-human accounts in your environment, and unlike a service that does one fixed job, this one actually reasons about what to do next. Two risks ride along with it: the privileges the identity holds, and the components it pulls in to use them.

ASI03 Identity and Privilege Abuse

The quickest way to ship an SRE agent is to hand it a service account that can do everything, because you do not yet know which permissions it will end up needing. That convenience is core the vulnerability. Machine identities already outnumber human ones by more than eighty to one, per CyberArk's 2025 landscape report, and the agent is one more. Except this identity reasons, chains actions, and can be talked into spending its privileges by the data it reads.

By the numbers, teams have not caught up. IBM's 2025 Cost of a Data Breach report found that 13% of organisations had a breach of an AI model or application, and 97% of those lacked proper AI access controls. Gartner expects a quarter of enterprise breaches to trace back to AI agent abuse by 2028.

The fix is an identity of its own and to never a borrowed admin one. Permissions should be task-scoped and short-lived, so a token minted to restart one deployment cannot also drain a node. Authorise every action against policy at call time, exactly as you would for a human operator with that much reach.

ASI04 Agentic Supply Chain

Your agent's capabilities are not all baked into its image. It loads MCP servers, pulls runbooks from a repo, fetches a tool definition from a registry. A lot of that arrives at runtime, from somewhere you do not fully control. A compromised MCP server or a poisoned runbook is a supply-chain attack that shows up as new behaviour rather than new code in your build, which makes it harder to catch in review. Your dependency surface is no longer the model weights and a handful of packages. It is every connector that can act on your cluster.

In practice, treat connectors and runbooks like the dependencies they are, so pin and sign them, pull from a curated registry, not the open internet at the moment of an incident. Sandbox what a new tool can reach before it reaches production. And keep watching them, because a connector that was safe at install can ship a malicious update three months later.

When the Agent Remembers

A chatbot forgets you between sessions. An SRE agent is built to do the opposite. The point is for it to remember what broke last quarter, what the fix was, which postmortems are worth trusting. That memory is also somewhere a bad entry can sit quietly and steer every decision that comes after.

ASI06 Memory and Context Poisoning

A single postmortem that claims the fix for high memory pressure is to disable the OOM killer becomes an incident of its own when the agent repeats with full confidence at three in the morning. The thing that makes the agent better over time, its store of past incidents and what fixed them, is the same thing an attacker or an honest mistake can corrupt to influence every future investigation.

Poisoned memory is worse than a one-shot injection precisely because it persists. Whoever poisoned it, attacker or your own engineer, only has to land once. After that the agent does the spreading for them, across sessions and incidents.

In practice, track provenance on everything the agent remembers: who wrote it, when, from which incident. Segment memory so one tenant or one bad source cannot rewrite the rest. Expire it, and quarantine anything that looks off. A memory the agent will act on should be at least as auditable as a runbook you would hand a new on-call engineer.

When Agents Work as a Fleet

The capable version of operational AI is rarely a single agent. It is an overview agent, an investigator, a remediation agent, sometimes a whole pipeline of them passing work along. That structure improves the system a lot and it adds three failure modes a lone model never has. The channel between agents can also be tampered with. One agent's mistake can become the whole fleet's. And an agent can go bad while every step it takes still looks routine.

ASI07 Insecure Inter-Agent Communication

An analysis agent narrows the problem, an investigator gathers evidence, a remediation agent acts, and findings pass between them. If that hand-off rides an unauthenticated channel, anything that can inject a message into it can hand the remediation agent instructions the analysis agent never sent. Trust you placed in your own pipeline becomes the way in.

Authenticate and encrypt agent-to-agent traffic the way you would any service-to-service call. Mutual TLS, signed payloads, replay protection. An instruction is only as trustworthy as the identity that signed it. If you cannot say which agent said something, you have no safe way to act on it, and "a message arrived on the internal channel" is not an identity.

ASI08 Cascading Failures

One mis-read signal, amplified by the architecture around it. A first agent decides a region is unhealthy and starts shifting traffic away. A second sees the moved load and scales down the now-idle region. A third reads the resulting error spike as a fresh incident and responds. Each step is locally reasonable but together they are an outage the agents created and are now confidently cleaning up. Interconnection that makes a fleet powerful is the same property that lets one error compound into many.

Limits on blast radius do not live inside the agent, because the agent is the thing whose judgment is in question. Rate-limit actions, put circuit breakers on remediation, cap how much a single investigation can change before a human is pulled in. Test multi-step plans before they run against production, and make sure two agents cannot ping-pong an action between them without a ceiling that stops the loop.

ASI10 Rogue Agents

The most difficult case is the agent that looks fine on first glance, but compromised through one of the paths above, or simply misaligned with a goal it was handed. It keeps operating. Repeating actions, persisting across sessions, occasionally impersonating another agent, while every individual step reads like normal operation. In production, "reads like normal operation" is exactly the thing you cannot afford to trust on its own.

Watch the agent the way you watch any privileged actor in your environment. Behavioural monitoring on what it does and not only on what it says. Governance over which agents exist and what each is permitted to become. And always make sure to implement a killswitch: one action that revokes the agent's identity and tools immediately.

When the Agent Persuades You

The last issue is the most subtle, because it points back at you. An agent does not only act on systems, it reports to the humans who supervise it, and a calm, fluent report is a control surface of its own. If the agent can talk you into approving the wrong action, every safeguard that sits downstream of your approval has already been bypassed.

ASI09 Human-Agent Trust Exploitation

This is the failure we described in Claude Code Is Not an SRE Agent. The model reads symptoms fast, writes a clean narrative, and arrives at a root cause that is plausible and wrong. Worse: a fluent, confident agent does not just mislead you. It gets you to approve its action.

And when it asks for that approval forty times an hour, your on-call engineer stops reading and starts clicking. OWASP has a name for the second failure: overwhelming the human in the loop. A reviewer who rubber-stamps is not a safeguard anymore.

So keep one thing in mind: confidence is not evidence. The agent should show its work, the signals it used, the query it ran, the parts it is unsure about, and not just its conclusion. Reserve human approval for the actions that actually need one, so the approvals that remain still carry weight. Do not let an operator drown in confirmation. Keep an immutable log of what was approved, by whom, and why.

How We Approach This at Hyground

The red thread running through the ten items is reach: how much the agent can touch, remember, and set in motion before anyone checks. Our answer is to keep that reach narrow by default and tie it to where the agent runs. Hyground sits inside the customer's own cluster and works from a scoped, task-bound identity, so what it can reach is set by the environment it runs in, not by how much anyone trusts the model. It reads widely, because that is how it investigates. Anything that changes production with real blast radius waits for a human. The goal is to give everyone on call the depth of a senior SRE, while keeping the decisions that carry real risk with a person who can answer for them.

Closing Thought

Each risk is the price of a capability you actually wanted: action, an identity, memory, a fleet, a report a human can trust. You do not reduce the risk by removing the capability, because then you are back to a chatbot that watches production and cannot help. You reduce it by fencing each power to the job in front of it, which is what OWASP means by least agency. That fencing is the real engineering, and the list cannot do it for you, because the right limit depends on what you let the agent touch and on what your team can live with when it gets something wrong.

Keeping a version like this current is the harder problem. Pinning connectors, watching for poisoned updates, re-scoping identities, and holding the autonomy boundaries in place as models, MCP servers, and the wider agent ecosystem keep drifting. That work does not finish. It is also most of what teams underestimate when they price out building one of these in house, which is the same build-versus-buy question we wrote about in Hyground vs DIY, and one of the eight problems you have to solve to go from a dev agent to an SRE agent.

Tim Chen

Author

Tim Chen

Forward Deployed Engineer

AI Engineer driven by deep curiosity for technology. Versatile full-stack background with a focus on security, thrives on solving complex problems and building impactful solutions. Outside of work, enjoys cooking or Arduino projects.

Keep exploring

Article

From dev agent to SRE agent: eight things your team has to solve

Pointing Claude Code at your cluster and watching it diagnose a CrashLoopBackOff looks impressive. The gap from that demo to an SRE agent your team trusts in production is eight hard problems, and most aren't solved by the model at all.

Article

Claude Code Is Not an SRE Agent

AI is great at observing production systems but can't replace SREs because root cause analysis requires system history, institutional knowledge, and human judgment that models lack.