Claude Code Foundations of Context Engineering Created: 13 Apr 2026 Updated: 13 Apr 2026

Best Practices for Crafting System Prompts

Imagine you are giving directions to a new employee. If you hand them a 50-page manual covering every possible scenario, they will freeze. If you just say "do your best," they will wander. But if you tell them "Here is what we do, here is what matters, and here is how to decide when you are unsure" — they will get started confidently and figure out the edge cases on their own.

Writing a system prompt for an AI agent works the same way. Your system prompt is the foundation of the agent's behavior — it tells the model who it is, what it should do, and how it should think. But finding the right level of detail is not easy. Too much detail creates a rigid, brittle system. Too little creates an agent that drifts and improvises in unpredictable ways.

In this article, we explore a concept from Anthropic's Effective Context Engineering for AI Agents article: the Goldilocks zone for system prompts. We will break down the two common failure modes — overly specific and overly vague prompts — and then show what a well-balanced, middle-ground prompt looks like in practice.

The Goldilocks Zone for System Prompts

Anthropic's engineering team describes the ideal system prompt as being written at the right altitude — not too high and not too low. They call this the Goldilocks zone, the sweet spot between two extremes that both lead to failure:

  1. Too specific — the prompt tries to anticipate every scenario using explicit if/then rules, leaving the model no room to reason
  2. Too vague — the prompt provides minimal guidance, assuming the model will "figure things out" on its own

In Anthropic's words: "System prompts should be extremely clear and use simple, direct language that presents ideas at the right altitude for the agent." The goal is to give the agent enough direction to act confidently, while leaving enough flexibility for it to adapt to situations you did not explicitly predict.

Think of it like training a new employee again. The best training does not hand them a giant decision tree. It teaches them principles, gives them examples, and tells them when to ask for help. That is the Goldilocks zone.

Problem: Overly Specific Prompts

The first failure mode is writing a prompt that tries to be a deterministic state machine — essentially turning the LLM into a rigid if/else executor. It usually looks like a massive block of rules covering every possible scenario with explicit conditionals.

Example: An Overly Specific Prompt

You are a customer service agent. Follow these rules EXACTLY:

1. If the customer says "refund", ask for order number
2. If order is less than 30 days old, process refund
3. If order is 30-60 days old, offer store credit
4. If order is over 60 days, decline politely
5. If customer mentions "manager", transfer to human
6. If customer uses profanity, warn once then disconnect
7. If customer asks about shipping, check order status first
8. If order status is "delivered", confirm delivery date
9. If order status is "in transit", provide tracking number
10. [... 200 more rules ...]

Why This Fails

At first glance, a detailed rule-based prompt seems responsible and thorough. In practice, it creates serious problems:

  1. Hardcoded logic eliminates reasoning. The whole point of using an LLM is that it can understand nuance, context, and intent. When you give it a rigid decision tree, you are replacing its intelligence with a lookup table. Any scenario not covered by the rules — and there will be many — causes the agent to either freeze, hallucinate a response, or follow the wrong branch.
  2. Exhaustive enumeration fails at scale. You simply cannot anticipate every possible customer message. Real conversations are messy. Customers combine topics, use sarcasm, ask ambiguous questions, or bring up things you never expected. A ruleset that tries to cover everything inevitably misses things — and the gaps are exactly where the agent fails most visibly.
  3. Maintenance becomes a nightmare. Every new scenario, policy change, or edge case requires adding more rules. The prompt grows linearly with complexity. Over time, it becomes a tangled mess where rules contradict each other, overlap, or become outdated. Updating one rule can break others. You end up spending more time managing the prompt than building the product.

In short, an overly specific prompt fights against the model's strengths. It forces the LLM to behave like a scripted chatbot — the thing LLMs were designed to improve upon.

Problem: Overly Vague Prompts

The other extreme is the minimalist prompt. It gives almost no direction, relying entirely on the model's general knowledge and training to produce the right behavior.

Example: An Overly Vague Prompt

You are a helpful customer service agent for our store. Be nice to customers and help them with their issues.

Why This Fails

This type of prompt feels clean and simple, but it hides a fundamental problem: it assumes shared context that does not exist.

  1. No actionable guidance. "Be nice" and "help them" are subjective and vague. What does "help" look like for a refund request vs. a complaint vs. a product question? The model has no idea what your specific policies are, what tools it can use, or what outcomes you consider successful. It will fall back on generic customer service patterns from its training data, which may not match your business at all.
  2. False assumption of shared context. The model does not know your return policy, your product catalog, your tone guidelines, or your escalation process. It cannot look these things up on its own. Without this context in the prompt, the model will make things up — and it will do so confidently, because that is what LLMs do when they lack information.
  3. Undefined boundaries. When should the agent refuse a request? When should it escalate? What topics are off-limits? Without explicit boundaries, the agent will try to handle everything, including things it should not, which leads to unpredictable and potentially harmful behavior.
  4. No reasoning framework. The model has no framework for making decisions. When faced with a tough situation — a frustrated customer demanding something outside policy — the model has to guess. Sometimes it will be too generous, sometimes too strict, and the results will be inconsistent from one conversation to the next.

In short, an overly vague prompt treats the LLM like it already knows your business. It does not. You are starting from a blank slate every time the model receives a new context window.

What a Good Middle-Ground Prompt Looks Like

A well-crafted system prompt sits between these two extremes. It provides clear identity, scope, and principles — then trusts the model to apply them intelligently across situations. Anthropic's official prompting best practices reinforce this: be clear and direct, give the model a role, add context to improve performance, and structure your instructions with clear sections.

Example: A Balanced Prompt

You are a customer service agent for Sunshine Bakery.

## Identity and Scope
- You handle order inquiries, refund requests, product questions,
and delivery issues
- You do NOT handle: payment disputes (→ redirect to billing team),
wholesale orders (→ redirect to sales@sunshine.com),
or anything unrelated to our bakery

## Policies
- Refunds: Accepted within 30 days with order number.
After 30 days, offer store credit as an alternative.
- Delivery: Standard delivery is 3-5 business days.
If delivery is late, apologize and offer to check status.

## Tone and Approach
- Warm, friendly, and patient — like talking to a neighbor
- Acknowledge the customer's feelings before jumping to solutions
- If you are unsure about something, say so honestly
and offer to connect the customer with a human team member

## Decision Framework
- When a situation is not covered by the policies above,
prioritize customer satisfaction while being honest
about what you can and cannot do
- Never make promises you cannot verify (e.g., "it will arrive
tomorrow" without checking the system)
- When in doubt, escalate to a human rather than guessing

Why This Works

This prompt is fundamentally different from both the over-specific and over-vague approaches. Here is what makes it effective:

1. Clear Identity and Scope

The agent knows exactly who it is (Sunshine Bakery's customer service agent), what it handles (orders, refunds, products, delivery), and what it does not handle (billing disputes, wholesale, off-topic). This prevents the agent from overreaching while giving it confidence within its domain.

2. Empowers Rather Than Constrains

Instead of listing 200 if/then rules, the prompt gives the model policies and principles to reason with. The refund policy says "within 30 days with order number" and "after 30 days, offer store credit." That is enough information for the model to handle most refund scenarios correctly — including edge cases the prompt writer never imagined.

3. Reasoning Framework Instead of Decision Tree

The "Decision Framework" section is particularly powerful. It tells the model how to think when situations fall outside the explicit rules: prioritize customer satisfaction, be honest about limitations, never make unverifiable promises, and escalate when unsure. This is exactly the kind of "altitude" Anthropic recommends — principles that guide behavior without scripting it.

4. Clear Boundaries and Escalation

The prompt defines clear boundaries (what is in scope and what is not) and provides an explicit escalation path ("connect the customer with a human team member"). This prevents the agent from either refusing to help or overstepping its authority.

Applying These Principles in Practice

The Goldilocks zone is not just a concept — it is a practical framework you can apply every time you write a system prompt. Anthropic's official best practices align with this approach:

Structure Your Prompts with Clear Sections

Organize your prompt into distinct sections using Markdown headers or XML tags. Anthropic recommends sections like <background_information>, <instructions>, ## Tool guidance, and ## Output description. This structure helps the model parse and prioritize different parts of the prompt.

Give the Model a Role

Even a single sentence setting the role focuses the model's behavior and tone. "You are a customer service agent for Sunshine Bakery" is more effective than "You are a helpful assistant." The more specific the identity, the more consistent the behavior.

Add Context About Why Instructions Matter

Do not just tell the model what to do — tell it why. Providing motivation behind your instructions helps the model better understand your goals and generalize to new situations. For example, saying "Acknowledge the customer's feelings before jumping to solutions — this builds trust and reduces frustration" is more effective than just "Acknowledge feelings."

Use the Golden Rule

Anthropic suggests a simple test: "Show your prompt to a colleague with minimal context on the task and ask them to follow it. If they would be confused, Claude will be too." If a reasonable person could read your prompt and act on it without asking twenty clarifying questions, you are in the Goldilocks zone.

Comparison: Three Approaches at a Glance

AspectOverly SpecificOverly VagueGoldilocks Zone
LengthVery long (hundreds of rules)Very short (1–2 sentences)Moderate (structured sections)
ApproachRigid if/then decision tree"Be nice and helpful"Principles + policies + escalation
Edge case handlingFails on anything not listedGuesses unpredictablyReasons from principles
MaintenanceGrows endlessly, rules conflictNothing to maintain, nothing to guideUpdate policies as needed
Uses LLM strengths?No — treats model as lookup tableNo — gives model nothing to work withYes — leverages reasoning ability
ConsistencyHigh within rules, fails outside themLow — varies by conversationHigh — principles apply broadly

Summary

Writing a good system prompt is not about being exhaustive or minimal — it is about being clear, structured, and principled. The Goldilocks zone means:

  1. Give the model a clear identity and scope — who it is and what it handles
  2. Provide concrete policies for common scenarios, not exhaustive rules for every edge case
  3. Define a reasoning framework for the grey areas — how should the model decide when the rules run out?
  4. Set clear boundaries and escalation paths — what is off-limits and when to hand off to a human
  5. Write at the level where a smart new employee could read the prompt and start working effectively

The models are getting smarter with every generation. As Anthropic's official guidance puts it: "Think of Claude as a brilliant but new employee who lacks context on your norms and workflows. The more precisely you explain what you want, the better the result." Your system prompt is the onboarding document for that employee. Make it count.

Share this lesson: