Context Engineering for AI Agents
If you have been working with AI agents — whether coding assistants like Cursor and Claude Code, or custom agents you built for your company — you have probably noticed something important: it all boils down to a prompt being sent to a Large Language Model (LLM), and a lot of engineering around it.
There is some truth in calling applications like Cursor and Claude Code "just wrappers around LLMs." However, building a really good wrapper requires deep knowledge and serious engineering work. The system that surrounds the LLM — often called an agent harness — is where most of the real engineering lives. It manages tool calls, controls the agent loop, handles errors, enforces guardrails, and, most importantly, decides what context is sent to the model at each step.
This article introduces context engineering, explains why it has become a critical concept when building and using modern AI agents, and shows how poor context handling leads to degraded performance, higher costs, hallucinations, and inconsistent behavior. By the end, you will have a clear mental model for context engineering and understand how it is applied in practice.
Core Concepts
What Is Context?
Every time you call an LLM, you send it a context window — a block of text that includes everything the model needs to generate a response. Think of it as the model's short-term memory: it can only "see" what you put in that window. Anything outside of it simply does not exist for the model.
Context can come from multiple places:
- The developer of the application — system prompts, tool definitions, instructions baked into the agent harness
- The user — the current message, preferences, custom instructions
- Previous interactions — conversation history, tool call results, retrieved documents, external data
Every day, new sources of context are added. Memory systems, file contents, search results, database queries, API responses — all of these can become part of the context. And the amount of context keeps increasing.
The Agent Harness
The agent harness is the orchestration layer that wraps around the LLM. It is responsible for:
- Managing tool calls and their outputs
- Controlling the agent loop (deciding when to call the LLM again and when to stop)
- Handling errors and retries
- Enforcing guardrails and safety constraints
- Deciding what context is sent to the model at each step
In practice, the model invocation itself is often straightforward. What determines whether an agent works reliably is how the surrounding harness manages state, tools, memory, and context. Most of the real engineering does not live inside the LLM call — it lives around it.
From Prompt Engineering to Context Engineering
In the early days of working with LLMs, we believed that prompt engineering was enough. We thought that writing carefully crafted prompts could fix problems and give us what we want. And for simple, single-turn tasks, that was often true.
The issue, however, is that prompts are static, while context is extremely dynamic. A static prompt cannot adapt to the changing state of a conversation, the growing results from tool calls, or the shifting needs of a multi-step task.
If context is dynamic, then constructing the correct context requires a dynamic system as well. It is no longer just about writing a clever prompt template. This is why we are entering the realm of context engineering — the natural evolution of prompt engineering, but a much deeper concept.
| Aspect | Prompt Engineering | Context Engineering |
|---|---|---|
| Focus | Crafting the right instruction text | Building the right context dynamically |
| Nature | Static — written once, used many times | Dynamic — assembled at runtime |
| Scope | Single LLM call | Entire agent lifecycle (multi-turn, multi-tool) |
| Who controls it | Developer (mostly) | Developer, user, and the system itself |
| Techniques | Few-shot examples, role prompts, chain-of-thought | Context selection, compression, isolation, memory management |
Why Context Matters — Garbage In, Garbage Out
We all know the saying "garbage in, garbage out." This is one of the most common reasons why agentic systems underperform. They are simply not provided with the right context.
LLMs cannot read our minds. We need to give them the right information. And it is not always just data — sometimes we need to give them the correct tools so they can fetch information, take actions, and perform tasks on our behalf.
Modern LLMs are getting better and better at reasoning. With tool calling, we can build AI agents that invoke tools, receive outputs, and loop until tasks are completed. This is extremely powerful, but it introduces a new challenge: context growth.
The Context Growth Problem
When an agent runs a long, complex task, it accumulates outputs from many tool calls. Each tool call adds results to the conversation. Each LLM response adds reasoning. Over multiple turns, the context window keeps growing, filled with tool call results and intermediate outputs.
Imagine an agent that needs to:
- Read 10 files from a codebase
- Run a test suite and collect the output
- Search the web for documentation
- Apply a fix and verify it works
By step 4, the context window may contain thousands of tokens from file contents, test outputs, search results, and the agent's own reasoning. Much of that content is no longer relevant, but it is still sitting in the context window, consuming tokens and influencing the model.
This leads to several problems:
- Context window limit exceeded — The model simply cannot accept more input, and the agent breaks
- Cost and latency increase — More tokens mean higher API costs and slower responses
- Agent performance degrades — The model struggles to find the relevant information among irrelevant noise
If nothing is done, this degradation becomes unavoidable. The agent starts making worse decisions, hallucinating, or going in circles. This is not a hypothetical problem — it is the default behavior of any unmanaged context system.
Context Failures: Poisoning, Confusion, and Clash
When context is allowed to grow without structure, selection, or control, specific failure modes start to appear:
Context Poisoning
This happens when a hallucination from a previous tool call or LLM response enters the context and starts affecting future outputs. For example, if the agent hallucinates a function name that does not exist and that hallucination stays in the context, subsequent steps may reference and build upon that non-existent function. The error propagates forward.
Context Confusion
This occurs when irrelevant context influences the response, even though it has nothing to do with the current task. For example, if earlier in the conversation you discussed database schemas, and now you are asking about CSS styling, leftover database context can subtly steer the model's response in the wrong direction.
Context Clash
This happens when different parts of the context contradict each other. For instance, one instruction says "always use TypeScript" while a retrieved document shows examples in JavaScript. The model receives conflicting signals and may produce inconsistent output.
Hands-On: Context Engineering in Practice
Context Engineering Techniques
Now that we understand the problems, let us look at the techniques used to manage context effectively. These techniques are applied both by application developers (the people building AI agents) and by users (the people using those agents).
Technique 1: Context Selection (Write)
Not everything should go into the context. Context selection means carefully choosing what information to include. Application developers implement this by:
- Only including relevant file contents, not entire codebases
- Filtering tool outputs to extract the important parts
- Using semantic search to find and include only the most relevant documentation
As a user, you practice context selection every time you write a clear, specific prompt instead of a vague one. The more precise your request, the less noise enters the context.
Technique 2: Context Compression (Compress)
When context grows too large, it can be compressed. This means summarizing long outputs, trimming old conversation turns, or replacing detailed content with concise summaries.
For example, instead of keeping the full output of 500 test results in the context, an agent might compress it to:
This preserves the essential information while dramatically reducing token count.
Technique 3: Context Isolation (Isolate)
Context isolation means separating different tasks or sub-tasks into their own context windows. Instead of having one massive, ever-growing context, you split work into independent branches.
Claude Code uses this technique with sub-agents. When a complex task needs to explore a codebase or perform a side investigation, it spawns a sub-agent with its own fresh context. The sub-agent does its work and returns only the relevant result to the main context.
This prevents cross-contamination between unrelated tasks and keeps each context window focused.
Technique 4: Memory Systems (Remember)
Memory systems allow agents to persist important information across conversations without keeping everything in the active context window. Instead of relying solely on the conversation history, the agent can write facts to a memory store and retrieve them later when needed.
Common memory patterns include:
- Session memory — notes that last for the current conversation only
- User memory — preferences and patterns that persist across all conversations
- Repository memory — facts about a specific codebase or project
This way, the agent does not need to re-discover information it already learned, and the active context stays lean.
Technique 5: Instruction Files and Custom Context (User-Side)
Many modern AI agents allow users to inject persistent context through instruction files. For example, Claude Code uses CLAUDE.md files that are automatically loaded into the context at the start of every conversation.
These files let you define:
- Project-specific conventions and rules
- Preferred coding styles and patterns
- Build and test commands
- Architecture decisions
This is a powerful form of user-side context engineering. By writing good instruction files, you shape the context before the conversation even begins.
Step-by-Step Example
Tracing Context Through an Agent Interaction
Let us walk through a concrete example to see how context engineering works in practice. Imagine you are using a coding agent (like Claude Code) to fix a bug in a web application.
Step 1: Initial Context Assembly
Before the LLM sees your message, the agent harness assembles the initial context:
- System prompt — loaded by the agent developer, contains instructions on how the agent should behave
- Instruction files — the agent reads
CLAUDE.mdor similar files from the project, injecting project-specific rules - Memory — the agent loads any remembered facts from previous sessions
- User message — your actual request: "Fix the login bug — users get a 401 error after password reset"
At this point, the context is small, focused, and relevant.
Step 2: Tool Calls and Context Growth
The agent decides it needs more information. It makes several tool calls:
- Searches the codebase for authentication-related files
- Reads the login controller and the password reset handler
- Reads the relevant test file
- Runs the test suite to see the current failure
Each tool call adds output to the context. After these four calls, the context might contain 8,000 to 15,000 tokens of file contents and test output.
Step 3: Context Selection and Compression
A well-engineered agent does not keep everything. It applies context management:
- The search results return 20 files, but the agent only reads the 3 most relevant ones (selection)
- The test output is 2,000 lines, but the agent extracts only the 2 failing tests (compression)
- For a deeper investigation of the auth library, the agent spawns a sub-agent rather than dumping that exploration into the main context (isolation)
Step 4: Fix and Verify
With the right context in place, the agent:
- Identifies the bug — the password reset handler invalidates the session token but does not issue a new one
- Applies a fix to the code
- Runs the tests again to verify the fix works
Because the context was well-managed throughout this process, the agent stayed focused and produced the correct fix without hallucinating.
What Would Happen Without Context Engineering?
Without these techniques, the agent might have:
- Read all 20 files into context (exceeding the window or drowning out the important information)
- Kept the full 2,000-line test output, making it harder to focus on the failing tests
- Gotten confused by irrelevant code from earlier searches
- Hallucinated a fix based on the wrong file or wrong function
This is why context engineering is not optional — it is the difference between an agent that works and one that does not.
Summary
In this article, we covered the fundamental concepts of context engineering and why it matters for modern AI agents:
- Context engineering is the discipline of dynamically assembling, selecting, compressing, and managing the information sent to an LLM at each step of an agent's execution.
- It is the natural evolution of prompt engineering. While prompts are static, context is dynamic and requires a dynamic system to manage it.
- The agent harness — the orchestration layer around the LLM — is where most of the real engineering lives. It manages tools, state, memory, and context.
- Unmanaged context growth leads to exceeded window limits, increased costs, and degraded agent performance.
- Specific failure modes include context poisoning (hallucinations propagating forward), context confusion (irrelevant information steering responses), and context clash (contradictory instructions).
- Key techniques to manage context include selection (choose what to include), compression (summarize long outputs), isolation (separate tasks into independent contexts), and memory systems (persist facts across sessions).
- Both developers and users play a role in context engineering. Users influence context through clear prompts, good instruction files, and focused interactions.
Understanding context engineering gives you a powerful mental model for working with AI agents effectively. Whether you are building agents or using them, the quality of the context determines the quality of the output.