From dev agent to SRE agent: eight things your team has to solve

Many teams evaluating AI for SRE and DevOps start the same way.

They experience an LLM doing great work in development and assume it carries over to operations - and it does, but only so far.

You point Claude Code at your cluster, wire up kubectl, ask it to look at a failing pod. It reads the logs, spots the CrashLoopBackOff, suggests a fix. It's genuinely impressive. So you think: we're halfway to an SRE agent. Let's roll it out and make the team faster.

This is the point where the real work begins.

The gap between "Claude Code plus kubectl on my laptop" and "an SRE agent my team trusts in production" is huge. The very first obstacles you face, both simple and tough, are outlined in this post. None of them are solved by a better prompt, and most aren't solved by the model at all.

01: Collaboration

The first wall you hit is that an agent on your laptop is an agent only you can use.

Ops isn't a solo activity. During an incident you want the whole team looking at the same investigation, in real time. You don't want to copy-paste your analysis from one machine to another, or dump it into some shared drive and hope the formatting survives.

You need common ground: a shared UX layer everyone works on at once. Which means the agent has to leave your machine, live somewhere central and provide a proper collaboration interface.

Most teams seem to think that's a feature you bolt on later. Wrong. It's the foundation everything else sits on.

02: Availability

Incidents don't wait for office hours, however, your laptop and engineers only work during those, by default.

In development, you instruct the agent while you're sitting at the keyboard. Operations is different: A lot of work is unplanned and ad-hoc. Especially the uncomfortable stuff: at 2am, on a weekend, while the system is under load and something just broke.

For the agent to do useful work while you're away from your desk, it has to run somewhere always-on, 24/7, doing incident analysis on its own. The proactive side of ops is where the value is, and the proactive side is precisely the part a laptop-bound agent can't reach.

03: Autonomy

When you first start out, your agent is a chatbot. You tell it what to do, it does it, it waits for the next instruction.

That covers only parts of what ops work actually is. The other half is just-in-time analyses and toil you never get around to: the checks, the cleanups, the slow-burning issues that never quite make it to the top of the queue.

An SRE agent has to act on schedules and triggers, not just on human prompts. It investigates on its own, picks up new tasks, clears toil without being asked. That takes a whole framework around the model and the triggers that drive it.

This is the real line between a dev agent and an SRE agent; a dev agent always has a human at the root of the trigger tree. An SRE agent doesn't.

04: Authorization

One kubectl context, and you've just handed the agent your privileges.

They move fast: local agent, cluster context, done. And then realize: I just escalated everything I can do onto an agent that acts on its own. That's a serious security and business risk, and it appears the moment things start working well.

You don't want to be the next post-mortem headline: "prod database deleted by our SRE agent."

So access has to be governed and managed centrally: no credentials on a laptop, curated connectors instead of open access, writes that are gated and audited. All of it built and maintained by your team.

05: Learning

Yesterday's change is today's incident. Does your agent know that?

An SRE agent has to learn your environment and hold on to what it learns. It needs to connect a change someone made yesterday to the instability you're seeing now. It needs to remember how your systems actually behave, not how the docs say they should.

This is huge - the models don't solve it, not even close. Memory and continuous learning across large, messy infrastructure is an open problem. So you build around the model, and you keep building, because there's no off-the-shelf answer to point at.

One cluster (or system boundary) works in the demo, but you have tens, even hundreds.

On a single cluster, the agent feels great. Then comes the real question: how does this work across 20 or 30 clusters, and how does the agent tell dev from staging from prod without confusing one for another at exactly the wrong moment? In reality, your agent has to be able to work across multiple clusters or system boundaries.

The answer is scalable context engineering. The agent needs to understand context boundaries. Let it retrieve what's relevant and don't dump the whole estate in and blow past the attention budget.

Real infrastructure is never a single cluster. Navigating it efficiently at scale is its own design problem.

07: Security

You've added a privileged actor to your system that can be talked into things.

Now you have two security problems instead of one.

The classic one: a highly-privileged, first-degree entity inside your environment that has to be secured the traditional way. You might know how to solve this, but it's more work.

The new one is AI security: jailbreaks, instruction injection, an attacker who plants malicious text in a log line your agent will read. You have to treat every ingested input as untrusted and filter accordingly. This world is only now emerging, and it's a genuinely big challenge.

08: Privacy

Which data may reach which model, hosted where?

Then come data privacy and regulation, and they don't bend. What data is the agent allowed to see? What has to be redacted (PII, passwords, credentials) before anything leaves the building? Which model may process which data, and where is that model physically hosted? Where is the agent itself even allowed to run, and what's it permitted to see once it's there?

For regulated and sovereignty-conscious customers, "it depends on the vendor's privacy policy" isn't an answer. The cleanest version of this runs in-cluster against a swappable or self-hosted model, with zero egress. Getting there is, again, work your team has to do.

Bonus challenge: keeping up

Everything above changes every few months, and keeps changing.

The best model this quarter is beaten next quarter. The best architecture shifts under you too: yesterday it was several specialised agents each doing their job; today it's a single model spinning up sub-agents dynamically; tomorrow it's something more effective that nobody's named yet.

Whatever you build for the eight problems above, you then have to keep rebuilding as the ground moves. Staying at the frontier is more than a full-time job, indefinitely.

That treadmill, not the first version you ship, is the real cost in any build-vs-buy decision.

So, build or buy?

If you only take one thing from this: the impressive demo and the production system are separated by at least those eight obstacles, and they compound. Each one is real engineering. Solving one badly undermines the others. Realistically, you'll be able allocating 5-7 senior developers for a year to build anything that comes close to a solution like Hyground, but by then you're again a year behind and in the age of AI, that's a lot.

This is exactly what we built Hyground to handle: collaboration, availability, autonomy, authorization, learning, navigation at scale, security, privacy, and we are running the treadmill. Your team gets an SRE agent without first spending a year building the platform underneath it.

If you want to see it run, that's the fastest way to judge whether building or buying is right for you. Book a demo: Hyground Demo

From dev agent to SRE agent: eight things your team has to solve

01: Collaboration

02: Availability

03: Autonomy

04: Authorization

05: Learning

06: Navigation

07: Security

08: Privacy

Bonus challenge: keeping up

So, build or buy?

Keep exploring

What OWASP LLM Top 10 Means for AI SRE Agents

The AI Treadmill: Why Keeping Up Is the Real Engineering Challenge

Hyground vs DIY