Best Practices for Crafting System Prompts
Imagine you are giving directions to a new employee. If you hand them a 50-page manual covering every possible scenario, they will freeze. If you just say "do your best," they will wander. But if you tell them "Here is what we do, here is what matters, and here is how to decide when you are unsure" — they will get started confidently and figure out the edge cases on their own.
Writing a system prompt for an AI agent works the same way. Your system prompt is the foundation of the agent's behavior — it tells the model who it is, what it should do, and how it should think. But finding the right level of detail is not easy. Too much detail creates a rigid, brittle system. Too little creates an agent that drifts and improvises in unpredictable ways.
In this article, we explore a concept from Anthropic's Effective Context Engineering for AI Agents article: the Goldilocks zone for system prompts. We will break down the two common failure modes — overly specific and overly vague prompts — and then show what a well-balanced, middle-ground prompt looks like in practice.
The Goldilocks Zone for System Prompts
Anthropic's engineering team describes the ideal system prompt as being written at the right altitude — not too high and not too low. They call this the Goldilocks zone, the sweet spot between two extremes that both lead to failure:
- Too specific — the prompt tries to anticipate every scenario using explicit if/then rules, leaving the model no room to reason
- Too vague — the prompt provides minimal guidance, assuming the model will "figure things out" on its own
In Anthropic's words: "System prompts should be extremely clear and use simple, direct language that presents ideas at the right altitude for the agent." The goal is to give the agent enough direction to act confidently, while leaving enough flexibility for it to adapt to situations you did not explicitly predict.
Think of it like training a new employee again. The best training does not hand them a giant decision tree. It teaches them principles, gives them examples, and tells them when to ask for help. That is the Goldilocks zone.
Problem: Overly Specific Prompts
The first failure mode is writing a prompt that tries to be a deterministic state machine — essentially turning the LLM into a rigid if/else executor. It usually looks like a massive block of rules covering every possible scenario with explicit conditionals.
Example: An Overly Specific Prompt
Why This Fails
At first glance, a detailed rule-based prompt seems responsible and thorough. In practice, it creates serious problems:
- Hardcoded logic eliminates reasoning. The whole point of using an LLM is that it can understand nuance, context, and intent. When you give it a rigid decision tree, you are replacing its intelligence with a lookup table. Any scenario not covered by the rules — and there will be many — causes the agent to either freeze, hallucinate a response, or follow the wrong branch.
- Exhaustive enumeration fails at scale. You simply cannot anticipate every possible customer message. Real conversations are messy. Customers combine topics, use sarcasm, ask ambiguous questions, or bring up things you never expected. A ruleset that tries to cover everything inevitably misses things — and the gaps are exactly where the agent fails most visibly.
- Maintenance becomes a nightmare. Every new scenario, policy change, or edge case requires adding more rules. The prompt grows linearly with complexity. Over time, it becomes a tangled mess where rules contradict each other, overlap, or become outdated. Updating one rule can break others. You end up spending more time managing the prompt than building the product.
In short, an overly specific prompt fights against the model's strengths. It forces the LLM to behave like a scripted chatbot — the thing LLMs were designed to improve upon.
Problem: Overly Vague Prompts
The other extreme is the minimalist prompt. It gives almost no direction, relying entirely on the model's general knowledge and training to produce the right behavior.
Example: An Overly Vague Prompt
Why This Fails
This type of prompt feels clean and simple, but it hides a fundamental problem: it assumes shared context that does not exist.
- No actionable guidance. "Be nice" and "help them" are subjective and vague. What does "help" look like for a refund request vs. a complaint vs. a product question? The model has no idea what your specific policies are, what tools it can use, or what outcomes you consider successful. It will fall back on generic customer service patterns from its training data, which may not match your business at all.
- False assumption of shared context. The model does not know your return policy, your product catalog, your tone guidelines, or your escalation process. It cannot look these things up on its own. Without this context in the prompt, the model will make things up — and it will do so confidently, because that is what LLMs do when they lack information.
- Undefined boundaries. When should the agent refuse a request? When should it escalate? What topics are off-limits? Without explicit boundaries, the agent will try to handle everything, including things it should not, which leads to unpredictable and potentially harmful behavior.
- No reasoning framework. The model has no framework for making decisions. When faced with a tough situation — a frustrated customer demanding something outside policy — the model has to guess. Sometimes it will be too generous, sometimes too strict, and the results will be inconsistent from one conversation to the next.
In short, an overly vague prompt treats the LLM like it already knows your business. It does not. You are starting from a blank slate every time the model receives a new context window.
What a Good Middle-Ground Prompt Looks Like
A well-crafted system prompt sits between these two extremes. It provides clear identity, scope, and principles — then trusts the model to apply them intelligently across situations. Anthropic's official prompting best practices reinforce this: be clear and direct, give the model a role, add context to improve performance, and structure your instructions with clear sections.
Example: A Balanced Prompt
Why This Works
This prompt is fundamentally different from both the over-specific and over-vague approaches. Here is what makes it effective:
1. Clear Identity and Scope
The agent knows exactly who it is (Sunshine Bakery's customer service agent), what it handles (orders, refunds, products, delivery), and what it does not handle (billing disputes, wholesale, off-topic). This prevents the agent from overreaching while giving it confidence within its domain.
2. Empowers Rather Than Constrains
Instead of listing 200 if/then rules, the prompt gives the model policies and principles to reason with. The refund policy says "within 30 days with order number" and "after 30 days, offer store credit." That is enough information for the model to handle most refund scenarios correctly — including edge cases the prompt writer never imagined.
3. Reasoning Framework Instead of Decision Tree
The "Decision Framework" section is particularly powerful. It tells the model how to think when situations fall outside the explicit rules: prioritize customer satisfaction, be honest about limitations, never make unverifiable promises, and escalate when unsure. This is exactly the kind of "altitude" Anthropic recommends — principles that guide behavior without scripting it.
4. Clear Boundaries and Escalation
The prompt defines clear boundaries (what is in scope and what is not) and provides an explicit escalation path ("connect the customer with a human team member"). This prevents the agent from either refusing to help or overstepping its authority.
Applying These Principles in Practice
The Goldilocks zone is not just a concept — it is a practical framework you can apply every time you write a system prompt. Anthropic's official best practices align with this approach:
Structure Your Prompts with Clear Sections
Organize your prompt into distinct sections using Markdown headers or XML tags. Anthropic recommends sections like <background_information>, <instructions>, ## Tool guidance, and ## Output description. This structure helps the model parse and prioritize different parts of the prompt.
Give the Model a Role
Even a single sentence setting the role focuses the model's behavior and tone. "You are a customer service agent for Sunshine Bakery" is more effective than "You are a helpful assistant." The more specific the identity, the more consistent the behavior.
Add Context About Why Instructions Matter
Do not just tell the model what to do — tell it why. Providing motivation behind your instructions helps the model better understand your goals and generalize to new situations. For example, saying "Acknowledge the customer's feelings before jumping to solutions — this builds trust and reduces frustration" is more effective than just "Acknowledge feelings."
Use the Golden Rule
Anthropic suggests a simple test: "Show your prompt to a colleague with minimal context on the task and ask them to follow it. If they would be confused, Claude will be too." If a reasonable person could read your prompt and act on it without asking twenty clarifying questions, you are in the Goldilocks zone.
Comparison: Three Approaches at a Glance
| Aspect | Overly Specific | Overly Vague | Goldilocks Zone |
|---|---|---|---|
| Length | Very long (hundreds of rules) | Very short (1–2 sentences) | Moderate (structured sections) |
| Approach | Rigid if/then decision tree | "Be nice and helpful" | Principles + policies + escalation |
| Edge case handling | Fails on anything not listed | Guesses unpredictably | Reasons from principles |
| Maintenance | Grows endlessly, rules conflict | Nothing to maintain, nothing to guide | Update policies as needed |
| Uses LLM strengths? | No — treats model as lookup table | No — gives model nothing to work with | Yes — leverages reasoning ability |
| Consistency | High within rules, fails outside them | Low — varies by conversation | High — principles apply broadly |
Summary
Writing a good system prompt is not about being exhaustive or minimal — it is about being clear, structured, and principled. The Goldilocks zone means:
- Give the model a clear identity and scope — who it is and what it handles
- Provide concrete policies for common scenarios, not exhaustive rules for every edge case
- Define a reasoning framework for the grey areas — how should the model decide when the rules run out?
- Set clear boundaries and escalation paths — what is off-limits and when to hand off to a human
- Write at the level where a smart new employee could read the prompt and start working effectively
The models are getting smarter with every generation. As Anthropic's official guidance puts it: "Think of Claude as a brilliant but new employee who lacks context on your norms and workflows. The more precisely you explain what you want, the better the result." Your system prompt is the onboarding document for that employee. Make it count.