Stop Shouting at Your LLM — Hyground

When we first built our agent system, we leaned heavily on emphasis markers in the system prompt: "IMPORTANT," "CRITICAL," "URGENT." Over time the prompt accumulated more of these markers as we tried to correct edge cases and failures.

In practice, this didn't scale well. As the prompt got "louder," it became harder to maintain a clear priority order.

This is emphasis saturation: when too many instructions are highlighted, the highlights stop helping. The prompt becomes more forceful in tone, but less clear in structure.

What we observed in production

We saw a consistent pattern:

Adding more emphasis ("URGENT," "ULTRA IMPORTANT," "!!!," bold, repeated warnings) did not reliably improve compliance.
Over time, emphasis markers lost impact. We kept escalating wording because earlier escalations stopped working.

We do not have a clean quantitative measurement for this - this is an operational lesson from iterating on real prompts with real users.

Why it happens

A useful mental model is that LLMs have a finite capacity to focus on what matters. When you add attention magnets, they compete with the actual instructions. If too many things look "critical," the model has to resolve conflicts rather than execute.

Anthropic's post about effective context engineering describes this idea as an "attention budget" for context: everything you add competes for limited capacity, so high-signal structure tends to outperform loud wording.

This is not a claim that "bold costs X attention units." It is a practical explanation for what we observed: shouting reduces clarity, and clarity is what models need.

Attention is not the same as token count

This is where teams often overcorrect. The goal is not "short prompts." The goal is low conflict.

A short prompt with competing priorities can perform worse than a longer prompt with a clear hierarchy. In practice:

Few tokens with many "must" rules that pull in different directions creates conflict.
A bit more context with one primary objective and supporting constraints often works better.

Google Gemini's prompt design strategies recommend being precise and direct, and avoiding unnecessary or overly persuasive language.

You spend the user's steering room

The highest cost we saw was not raw model quality. It was loss of steering capacity.

If the system prompt uses every emphasis trick available, all caps, bold, repeated "CRITICAL" markers, it dominates the instruction landscape. Then the user arrives with normal language and normal constraints, and their guidance has little chance to compete.

This compounds when you integrate external systems. Every MCP server and third-party integration comes with its own instructions and constraints. The more external sources you pull in, the louder the baseline becomes, and the harder it is for the user to steer final behavior.

What worked for us

We got better results by shifting from "louder" to "clearer":

Prefer structure over emphasis. Use clear sectioning and delimiters (for example XML tags) to separate context, tasks, constraints, and outputs.
Make priorities explicit. If there is one rule that dominates, state it once, plainly, early.
Reduce conflict. Remove redundant constraints and overlapping requirements.
Use emphasis sparingly. If you use it at all, reserve it for exactly one rule, after you have fixed structure and hierarchy.
Treat prompt tweaks as finite leverage. If repeated iterations do not fix a failure mode, it is often architectural (tooling, retrieval, guardrails, decomposition), not vocabulary.

Applying these principles helped us get better results at Hyground.