Claude Code Foundations of Context Engineering Created: 13 Apr 2026 Updated: 13 Apr 2026

Context Engineering for AI Agents

If you have been working with AI agents — whether coding assistants like Cursor and Claude Code, or custom agents you built for your company — you have probably noticed something important: it all boils down to a prompt being sent to a Large Language Model (LLM), and a lot of engineering around it.

There is some truth in calling applications like Cursor and Claude Code "just wrappers around LLMs." However, building a really good wrapper requires deep knowledge and serious engineering work. The system that surrounds the LLM — often called an agent harness — is where most of the real engineering lives. It manages tool calls, controls the agent loop, handles errors, enforces guardrails, and, most importantly, decides what context is sent to the model at each step.

This article introduces context engineering, explains why it has become a critical concept when building and using modern AI agents, and shows how poor context handling leads to degraded performance, higher costs, hallucinations, and inconsistent behavior. By the end, you will have a clear mental model for context engineering and understand how it is applied in practice.

Core Concepts

What Is Context?

Every time you call an LLM, you send it a context window — a block of text that includes everything the model needs to generate a response. Think of it as the model's short-term memory: it can only "see" what you put in that window. Anything outside of it simply does not exist for the model.

Context can come from multiple places:

The developer of the application — system prompts, tool definitions, instructions baked into the agent harness
The user — the current message, preferences, custom instructions
Previous interactions — conversation history, tool call results, retrieved documents, external data

Every day, new sources of context are added. Memory systems, file contents, search results, database queries, API responses — all of these can become part of the context. And the amount of context keeps increasing.

The Agent Harness

The agent harness is the orchestration layer that wraps around the LLM. It is responsible for:

Managing tool calls and their outputs
Controlling the agent loop (deciding when to call the LLM again and when to stop)
Handling errors and retries
Enforcing guardrails and safety constraints
Deciding what context is sent to the model at each step

In practice, the model invocation itself is often straightforward. What determines whether an agent works reliably is how the surrounding harness manages state, tools, memory, and context. Most of the real engineering does not live inside the LLM call — it lives around it.

From Prompt Engineering to Context Engineering

In the early days of working with LLMs, we believed that prompt engineering was enough. We thought that writing carefully crafted prompts could fix problems and give us what we want. And for simple, single-turn tasks, that was often true.

The issue, however, is that prompts are static, while context is extremely dynamic. A static prompt cannot adapt to the changing state of a conversation, the growing results from tool calls, or the shifting needs of a multi-step task.

If context is dynamic, then constructing the correct context requires a dynamic system as well. It is no longer just about writing a clever prompt template. This is why we are entering the realm of context engineering — the natural evolution of prompt engineering, but a much deeper concept.

Aspect	Prompt Engineering	Context Engineering
Focus	Crafting the right instruction text	Building the right context dynamically
Nature	Static — written once, used many times	Dynamic — assembled at runtime
Scope	Single LLM call	Entire agent lifecycle (multi-turn, multi-tool)
Who controls it	Developer (mostly)	Developer, user, and the system itself
Techniques	Few-shot examples, role prompts, chain-of-thought	Context selection, compression, isolation, memory management

Why Context Matters — Garbage In, Garbage Out

We all know the saying "garbage in, garbage out." This is one of the most common reasons why agentic systems underperform. They are simply not provided with the right context.

LLMs cannot read our minds. We need to give them the right information. And it is not always just data — sometimes we need to give them the correct tools so they can fetch information, take actions, and perform tasks on our behalf.

Modern LLMs are getting better and better at reasoning. With tool calling, we can build AI agents that invoke tools, receive outputs, and loop until tasks are completed. This is extremely powerful, but it introduces a new challenge: context growth.

The Context Growth Problem

When an agent runs a long, complex task, it accumulates outputs from many tool calls. Each tool call adds results to the conversation. Each LLM response adds reasoning. Over multiple turns, the context window keeps growing, filled with tool call results and intermediate outputs.

Imagine an agent that needs to:

Read 10 files from a codebase
Run a test suite and collect the output
Search the web for documentation
Apply a fix and verify it works

By step 4, the context window may contain thousands of tokens from file contents, test outputs, search results, and the agent's own reasoning. Much of that content is no longer relevant, but it is still sitting in the context window, consuming tokens and influencing the model.

This leads to several problems:

Context window limit exceeded — The model simply cannot accept more input, and the agent breaks
Cost and latency increase — More tokens mean higher API costs and slower responses
Agent performance degrades — The model struggles to find the relevant information among irrelevant noise

If nothing is done, this degradation becomes unavoidable. The agent starts making worse decisions, hallucinating, or going in circles. This is not a hypothetical problem — it is the default behavior of any unmanaged context system.

Context Failures: Poisoning, Confusion, and Clash

When context is allowed to grow without structure, selection, or control, specific failure modes start to appear:

Context Poisoning

This happens when a hallucination from a previous tool call or LLM response enters the context and starts affecting future outputs. For example, if the agent hallucinates a function name that does not exist and that hallucination stays in the context, subsequent steps may reference and build upon that non-existent function. The error propagates forward.

Context Confusion

This occurs when irrelevant context influences the response, even though it has nothing to do with the current task. For example, if earlier in the conversation you discussed database schemas, and now you are asking about CSS styling, leftover database context can subtly steer the model's response in the wrong direction.

Context Clash

This happens when different parts of the context contradict each other. For instance, one instruction says "always use TypeScript" while a retrieved document shows examples in JavaScript. The model receives conflicting signals and may produce inconsistent output.

Hands-On: Context Engineering in Practice

Context Engineering Techniques

Now that we understand the problems, let us look at the techniques used to manage context effectively. These techniques are applied both by application developers (the people building AI agents) and by users (the people using those agents).

Technique 1: Context Selection (Write)

Not everything should go into the context. Context selection means carefully choosing what information to include. Application developers implement this by:

Only including relevant file contents, not entire codebases
Filtering tool outputs to extract the important parts
Using semantic search to find and include only the most relevant documentation

As a user, you practice context selection every time you write a clear, specific prompt instead of a vague one. The more precise your request, the less noise enters the context.

Technique 2: Context Compression (Compress)

When context grows too large, it can be compressed. This means summarizing long outputs, trimming old conversation turns, or replacing detailed content with concise summaries.

For example, instead of keeping the full output of 500 test results in the context, an agent might compress it to:

Test Results Summary:

- 498 tests passed

- 2 tests failed:

1. test_user_login — AssertionError: expected 200 but got 401

2. test_payment_flow — TimeoutError after 30s

This preserves the essential information while dramatically reducing token count.

Technique 3: Context Isolation (Isolate)

Context isolation means separating different tasks or sub-tasks into their own context windows. Instead of having one massive, ever-growing context, you split work into independent branches.

Claude Code uses this technique with sub-agents. When a complex task needs to explore a codebase or perform a side investigation, it spawns a sub-agent with its own fresh context. The sub-agent does its work and returns only the relevant result to the main context.

This prevents cross-contamination between unrelated tasks and keeps each context window focused.

Technique 4: Memory Systems (Remember)

Memory systems allow agents to persist important information across conversations without keeping everything in the active context window. Instead of relying solely on the conversation history, the agent can write facts to a memory store and retrieve them later when needed.

Common memory patterns include:

Session memory — notes that last for the current conversation only
User memory — preferences and patterns that persist across all conversations
Repository memory — facts about a specific codebase or project

This way, the agent does not need to re-discover information it already learned, and the active context stays lean.

Technique 5: Instruction Files and Custom Context (User-Side)

Many modern AI agents allow users to inject persistent context through instruction files. For example, Claude Code uses CLAUDE.md files that are automatically loaded into the context at the start of every conversation.

These files let you define:

Project-specific conventions and rules
Preferred coding styles and patterns
Build and test commands
Architecture decisions

This is a powerful form of user-side context engineering. By writing good instruction files, you shape the context before the conversation even begins.

Step-by-Step Example

Tracing Context Through an Agent Interaction

Let us walk through a concrete example to see how context engineering works in practice. Imagine you are using a coding agent (like Claude Code) to fix a bug in a web application.

Step 1: Initial Context Assembly

Before the LLM sees your message, the agent harness assembles the initial context:

System prompt — loaded by the agent developer, contains instructions on how the agent should behave
Instruction files — the agent reads CLAUDE.md or similar files from the project, injecting project-specific rules
Memory — the agent loads any remembered facts from previous sessions
User message — your actual request: "Fix the login bug — users get a 401 error after password reset"

At this point, the context is small, focused, and relevant.

Step 2: Tool Calls and Context Growth

The agent decides it needs more information. It makes several tool calls:

Searches the codebase for authentication-related files
Reads the login controller and the password reset handler
Reads the relevant test file
Runs the test suite to see the current failure

Each tool call adds output to the context. After these four calls, the context might contain 8,000 to 15,000 tokens of file contents and test output.

Step 3: Context Selection and Compression

A well-engineered agent does not keep everything. It applies context management:

The search results return 20 files, but the agent only reads the 3 most relevant ones (selection)
The test output is 2,000 lines, but the agent extracts only the 2 failing tests (compression)
For a deeper investigation of the auth library, the agent spawns a sub-agent rather than dumping that exploration into the main context (isolation)

Step 4: Fix and Verify

With the right context in place, the agent:

Identifies the bug — the password reset handler invalidates the session token but does not issue a new one
Applies a fix to the code
Runs the tests again to verify the fix works

Because the context was well-managed throughout this process, the agent stayed focused and produced the correct fix without hallucinating.

What Would Happen Without Context Engineering?

Without these techniques, the agent might have:

Read all 20 files into context (exceeding the window or drowning out the important information)
Kept the full 2,000-line test output, making it harder to focus on the failing tests
Gotten confused by irrelevant code from earlier searches
Hallucinated a fix based on the wrong file or wrong function

This is why context engineering is not optional — it is the difference between an agent that works and one that does not.

Summary

In this article, we covered the fundamental concepts of context engineering and why it matters for modern AI agents:

Context engineering is the discipline of dynamically assembling, selecting, compressing, and managing the information sent to an LLM at each step of an agent's execution.
It is the natural evolution of prompt engineering. While prompts are static, context is dynamic and requires a dynamic system to manage it.
The agent harness — the orchestration layer around the LLM — is where most of the real engineering lives. It manages tools, state, memory, and context.
Unmanaged context growth leads to exceeded window limits, increased costs, and degraded agent performance.
Specific failure modes include context poisoning (hallucinations propagating forward), context confusion (irrelevant information steering responses), and context clash (contradictory instructions).
Key techniques to manage context include selection (choose what to include), compression (summarize long outputs), isolation (separate tasks into independent contexts), and memory systems (persist facts across sessions).
Both developers and users play a role in context engineering. Users influence context through clear prompts, good instruction files, and focused interactions.

Understanding context engineering gives you a powerful mental model for working with AI agents effectively. Whether you are building agents or using them, the quality of the context determines the quality of the output.

Share this lesson:

Details

Navigation

Progress 1 / 18

Start 6% Complete

Writing Context and Persistent Memory

Statistics

6 Lessons in Foundations of Context Engineering

5 SubCategories in Claude Code

18 Total Lessons in Claude Code

dotnetacademy