Microsoft Agent Framework Generative AI ( Tutorial 1 ) Created: 14 Apr 2026 Updated: 14 Apr 2026

Context Window: The LLM's Working Memory

Introduction

When you interact with a large language model (LLM), it does not have unlimited memory. The context window is the maximum amount of text the model can "see" at one time. It includes everything: the system instructions, conversation history, your current message, any injected data, and the model's own generated response. All of this must fit within a single token budget.

Understanding the context window is critical for developers because it directly affects how much information you can include in prompts, how many turns a conversation can retain, and how the model handles large documents or datasets.

What Is a Context Window?

A context window is defined as the maximum number of tokens an LLM can process in a single request. Think of it as the model's working memory budget: everything the model reads and writes during one interaction must fit inside this window.

The context window is shared between input and output. If a model has a 128,000-token context window and your prompt uses 2,000 tokens, you have up to 126,000 tokens remaining for the response. In practice, you also reserve some capacity for internal overhead and to avoid hitting the hard limit.

What Fills the Context Window?

Every request to an LLM fills the context window from several sources:

┌──────────────────────────────────────────────────────────┐
│ CONTEXT WINDOW │
│ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ System │ │ Conversation History │ │
│ │ Instructions │ │ (previous turns) │ │
│ └──────────────────┘ └──────────────────────────────┘ │
│ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ Injected Data │ │ User's Current │ │
│ │ (RAG, tools, etc.)│ │ Message │ │
│ └──────────────────┘ └──────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐│
│ │ Model's Generated Response ││
│ └──────────────────────────────────────────────────────┘│
└──────────────────────────────────────────────────────────┘
SourceDescription
System instructionsThe system prompt that defines the model's role and behavior
Conversation historyPrevious user messages and assistant replies in multi-turn conversations
Injected dataRetrieved documents (RAG), tool descriptions, function results, or any data added to the prompt
User's current messageThe latest prompt or question from the user
Generated responseThe tokens the model produces as its output — these also count against the window

Context Window Sizes by Model

Context window sizes have grown dramatically over time. Early models were limited to 4,000 tokens, while modern models can process over 1 million tokens in a single request.

ModelContext WindowApproximate Pages of Text
GPT-3.5 Turbo16,385 tokens~20 pages
GPT-4128,000 tokens~160 pages
GPT-4o128,000 tokens~160 pages
GPT-4o mini128,000 tokens~160 pages
GPT-4.11,047,576 tokens~1,300 pages
Claude 3.5 Sonnet200,000 tokens~250 pages

Note: "pages" is a rough estimate assuming ~800 tokens per page of English text.

Why the Context Window Matters

1. Information Capacity

Larger context windows let you include more information in a single request. With a small context window you might only fit a short question. With a large context window you can include entire documents, database records, or lengthy conversation histories.

2. Multi-Turn Conversations

In a chatbot, every previous message (both user and assistant) is sent to the model on each turn. As the conversation grows, it consumes more of the context window. When the history exceeds the window, the oldest messages must be dropped or summarized — and the model "forgets" them.

3. RAG and Tool Use

Retrieval-Augmented Generation (RAG) injects retrieved documents into the prompt. Tool descriptions and function call results also consume tokens. A limited context window constrains how many documents or tool results you can provide.

4. Cost

LLM APIs charge per token. A longer prompt means more input tokens, which means higher cost. Sending the same large context on every request can become expensive. Efficient context management directly reduces operational costs.

Strategies for Managing the Context Window

StrategyHow It WorksTrade-off
Trimming / TruncationCut old messages or long text to a token limitSimple but loses information abruptly
Sliding WindowKeep only the most recent N messagesEasy to implement; oldest context is discarded
SummarizationSummarize older conversation turns into a compact blockPreserves key facts but costs an extra LLM call
Selective Injection (RAG)Only inject the most relevant documents rather than all available dataRequires a retrieval pipeline; relevance depends on the search quality
ChunkingSplit large documents into smaller pieces and process them individuallyWorks well for extraction tasks; cross-chunk context can be lost

Counting Tokens in .NET

Before sending a request, you can measure exactly how many tokens your prompt will consume using the Microsoft.ML.Tokenizers library. This lets you:

  1. Calculate remaining context budget after system instructions and history
  2. Decide how many documents or records to inject
  3. Trim text precisely to a token boundary using GetIndexByTokenCount
  4. Estimate costs before making API calls
dotnet add package Microsoft.ML.Tokenizers
dotnet add package Microsoft.ML.Tokenizers.Data.O200kBase

Full Example

using Microsoft.ML.Tokenizers;

namespace MicrosoftAgentFrameworkLesson.ConsoleApp;

public static class ContextWindowDemo
{
// A library book catalog — completely unrelated to the official docs examples
private static readonly Dictionary<string, string> BookCatalog = new()
{
["The Midnight Garden"] = "A retired botanist discovers that her neglected rooftop garden blooms only under moonlight, attracting rare nocturnal pollinators from across the city. As she documents each species, she uncovers a forgotten Victorian seed vault hidden beneath the building's foundations.",
["Echoes of Iron"] = "Set in 1920s Pittsburgh, a young steelworker secretly photographs the brutal conditions inside the mills. When his images leak to the press, he becomes both a hero to the labor movement and a target for the industrialists who built the city.",
["The Cartographer's Daughter"] = "Born on a remote island, Lina inherits her father's passion for mapping uncharted coastlines. She sails along the jagged northern shores, cataloging every inlet and reef, only to find that the most dangerous territory lies within her own family's past.",
["Voltage"] = "A biomedical engineer develops a neural implant that restores movement to paralyzed patients. But when test subjects begin sharing the same vivid dream, she realizes the device is doing far more than stimulating motor neurons.",
["Flour & Flame"] = "After losing a Michelin-starred restaurant to a fire, a pastry chef rebuilds her career from a converted shipping container in a small harbor town. Her sourdough becomes legendary, but success threatens to pull her back into the high-pressure world she fled.",
["The Glass Census"] = "In a near-future city where every citizen wears a biometric bracelet, a census worker discovers that thousands of people simply do not exist in the system. Her investigation reveals a parallel community living entirely off-grid beneath the gleaming towers.",
["Canopy"] = "An arborist is hired to survey the oldest trees in a national forest scheduled for selective logging. Deep in the canopy, she finds carvings that predate European contact by centuries, forcing a collision between preservation science and indigenous land rights."
};

public static Task RunAsync()
{
Tokenizer tokenizer = TiktokenTokenizer.CreateForModel("gpt-4o");

// ══════════════════════════════════════════════════
// PART 1 — Visualize Context Window Budget
// ══════════════════════════════════════════════════
Console.WriteLine("═══ Part 1: Context Window Budget ═══\n");

const int gpt4oContextWindow = 128_000;
string systemMessage = "You are a helpful librarian assistant. Recommend books based on the reader's interests. Only recommend books from the catalog provided below.";
string userMessage = "I enjoy mystery novels with strong female leads and scientific themes.";

int systemTokens = tokenizer.CountTokens(systemMessage);
int userTokens = tokenizer.CountTokens(userMessage);
int reservedForOutput = 1_000; // reserve tokens for the model's response

Console.WriteLine($"Model context window : {gpt4oContextWindow:N0} tokens");
Console.WriteLine($"System message : {systemTokens} tokens");
Console.WriteLine($"User message : {userTokens} tokens");
Console.WriteLine($"Reserved for output : {reservedForOutput:N0} tokens");

int availableForCatalog = gpt4oContextWindow - systemTokens - userTokens - reservedForOutput;
Console.WriteLine($"Available for catalog : {availableForCatalog:N0} tokens");

// ══════════════════════════════════════════════════
// PART 2 — Measure Token Cost Per Book
// ══════════════════════════════════════════════════
Console.WriteLine("\n═══ Part 2: Token Cost Per Book ═══\n");

Console.WriteLine($"{"Book Title",-35} {"Summary Chars",14} {"Tokens",7}");
Console.WriteLine(new string('─', 60));

int totalCatalogTokens = 0;
foreach ((string title, string summary) in BookCatalog)
{
string entry = $"Title: {title}\nSummary: {summary}\n";
int entryTokens = tokenizer.CountTokens(entry);
totalCatalogTokens += entryTokens;

string displayTitle = title.Length > 32 ? title[..32] + "..." : title;
Console.WriteLine($"{displayTitle,-35} {summary.Length,14} {entryTokens,7}");
}

Console.WriteLine(new string('─', 60));
Console.WriteLine($"{"TOTAL",-35} {"",14} {totalCatalogTokens,7}");
Console.WriteLine($"\nFull catalog fits in budget: {(totalCatalogTokens <= availableForCatalog ? "YES" : "NO")}");

// ══════════════════════════════════════════════════
// PART 3 — Simulate a Small Context Window
// ══════════════════════════════════════════════════
Console.WriteLine("\n═══ Part 3: Simulating a Small Context Window (500 tokens) ═══\n");

const int tinyContextWindow = 500;
int tinyAvailable = tinyContextWindow - systemTokens - userTokens - reservedForOutput;
Console.WriteLine($"Tiny context window : {tinyContextWindow} tokens");
Console.WriteLine($"After system + user : {tinyAvailable} tokens available for catalog\n");

// Greedily pack books until the budget runs out
int usedTokens = 0;
int booksIncluded = 0;
List<string> includedTitles = [];
List<string> excludedTitles = [];

foreach ((string title, string summary) in BookCatalog)
{
string entry = $"Title: {title}\nSummary: {summary}\n";
int entryTokens = tokenizer.CountTokens(entry);

if (usedTokens + entryTokens <= tinyAvailable)
{
usedTokens += entryTokens;
booksIncluded++;
includedTitles.Add(title);
}
else
{
excludedTitles.Add(title);
}
}

Console.WriteLine($"Books that fit : {booksIncluded} / {BookCatalog.Count}");
Console.WriteLine($"Tokens used : {usedTokens} / {tinyAvailable}");
Console.WriteLine($"Included : {string.Join(", ", includedTitles)}");
Console.WriteLine($"Excluded (no room) : {string.Join(", ", excludedTitles)}");

// ══════════════════════════════════════════════════
// PART 4 — Trimming Strategy with GetIndexByTokenCount
// ══════════════════════════════════════════════════
Console.WriteLine("\n═══ Part 4: Trimming Long Text to Fit ═══\n");

string fullCatalogText = string.Join("\n\n", BookCatalog.Select(b => $"Title: {b.Key}\nSummary: {b.Value}"));
int fullTokens = tokenizer.CountTokens(fullCatalogText);
Console.WriteLine($"Full catalog text tokens : {fullTokens}");

const int trimTarget = 200;
int trimIndex = tokenizer.GetIndexByTokenCount(fullCatalogText, trimTarget, out string? processedText, out _);
processedText ??= fullCatalogText;
string trimmedText = processedText[..trimIndex];
int trimmedTokens = tokenizer.CountTokens(trimmedText);

Console.WriteLine($"Trimmed to ~{trimTarget} tokens : {trimmedTokens} actual tokens");
Console.WriteLine($"Trimmed text preview :\n");
Console.WriteLine(trimmedText);
Console.WriteLine("\n [... truncated ...]");

// ══════════════════════════════════════════════════
// PART 5 — Context Window Comparison Across Models
// ══════════════════════════════════════════════════
Console.WriteLine("\n═══ Part 5: Context Window Comparison ═══\n");

var models = new (string Name, int ContextWindow)[]
{
("GPT-3.5 Turbo", 16_385),
("GPT-4", 128_000),
("GPT-4o", 128_000),
("GPT-4o mini", 128_000),
("GPT-4.1", 1_047_576),
("Claude 3.5 Sonnet", 200_000)
};

int promptOverhead = systemTokens + userTokens + totalCatalogTokens;
Console.WriteLine($"{"Model",-25} {"Context Window",15} {"After Prompt",15} {"Utilization",12}");
Console.WriteLine(new string('─', 70));

foreach ((string name, int window) in models)
{
int remaining = window - promptOverhead;
double utilization = (double)promptOverhead / window * 100;
Console.WriteLine($"{name,-25} {window,15:N0} {remaining,15:N0} {utilization,10:F2} %");
}

return Task.CompletedTask;
}
}

Reference

LLM Fundamentals - Microsoft Learn


Share this lesson: