Production AI applications need more than basic model calls. They require caching to reduce costs, logging to debug issues, telemetry to monitor performance, and often custom behaviours like rate limiting or request counting. Microsoft.Extensions.AI solves this with a middleware pipeline pattern — the same concept used in ASP.NET Core.
How the Pipeline Works
Each middleware layer wraps the next one. When a request arrives it passes through every layer in order; the response travels back through the same layers in reverse:
Request → ConfigureOptions → Cache Check → Timer → AI Model
Response ← ConfigureOptions ← Cache Store ← Timer ← AI Model
If the cache already has the answer, the request never reaches the AI model.
Part 1 — ChatClientBuilder Pattern
Every pipeline starts with .AsBuilder() and ends with .Build(). Between those two calls you chain Use* methods to add middleware layers:
IChatClient simpleClient = new OpenAIClient(apiKey)
.GetChatClient("gpt-4o-mini")
.AsIChatClient()
.AsBuilder() // Start building the pipeline
.Build(); // No middleware — passes straight through
var response = await simpleClient.GetResponseAsync(
"Give me a one-sentence description of how to make pancakes.");
Console.WriteLine(response.Text);
Even without middleware the builder pattern works; it simply returns the inner client unchanged. Every Use* method adds a layer that wraps the client beneath it.
Part 2 — Caching Responses
UseDistributedCache stores responses for identical prompts. The second time you ask the same question the answer returns instantly from memory:
var cache = new MemoryDistributedCache(
Options.Create(new MemoryDistributedCacheOptions()));
IChatClient cachedClient = CreateBaseClient()
.AsBuilder()
.UseDistributedCache(cache)
.Build();
string[] prompts =
[
"Name three Italian pasta shapes.",
"What spice gives curry its yellow color?",
"Name three Italian pasta shapes." // duplicate — cached
];
foreach (var prompt in prompts)
{
var sw = Stopwatch.StartNew();
var cached = await cachedClient.GetResponseAsync(prompt);
sw.Stop();
Console.WriteLine($" [{sw.ElapsedMilliseconds,5} ms] {prompt}");
Console.WriteLine($" → {cached.Text}");
}
The third prompt “Name three Italian pasta shapes.” returns in under 1 ms because it is served from the in-memory cache. In production you would substitute a distributed cache such as Redis.
Part 3 — Logging Middleware
UseLogging records every request and response through the standard ILoggerFactory. This helps with debugging and auditing:
var loggerFactory = LoggerFactory.Create(builder =>
builder.AddConsole().SetMinimumLevel(LogLevel.Debug));
IChatClient loggedClient = CreateBaseClient()
.AsBuilder()
.UseLogging(loggerFactory)
.Build();
var response = await loggedClient.GetResponseAsync(
"What temperature should I bake sourdough bread at?");
Console.WriteLine(response.Text);
The console shows structured log entries including the prompt text, token counts, and timing information — all handled automatically by the middleware.
Part 4 — Configuring Default Options
ConfigureOptions sets defaults that apply to every request made through the client. Individual requests can still override them:
IChatClient configuredClient = CreateBaseClient()
.AsBuilder()
.ConfigureOptions(options =>
{
options.Temperature = 0.2f; // low creativity — factual answers
options.MaxOutputTokens = 150; // keep answers short
})
.Build();
// Uses the defaults (low temperature, 150 tokens)
var response = await configuredClient.GetResponseAsync(
"List five essential kitchen tools for a beginner cook.");
// Override for a specific request — high creativity
var creative = await configuredClient.GetResponseAsync(
"Invent a creative fusion dessert combining Japanese and Mexican cuisine.",
new ChatOptions { Temperature = 1.0f, MaxOutputTokens = 300 });
This is useful for setting organisation-wide guardrails while still allowing per-request flexibility.
Part 5 — Inline Custom Middleware
For simple cross-cutting concerns you can use the .Use() method with a delegate. The delegate receives the messages, options, a next function to call the next layer, and a cancellation token:
var requestCount = 0;
IChatClient countingClient = CreateBaseClient()
.AsBuilder()
.Use(async (messages, options, next, cancellation) =>
{
var current = Interlocked.Increment(ref requestCount);
Console.WriteLine($" [Counter] Request #{current}");
await next(messages, options, cancellation);
Console.WriteLine($" [Counter] Request #{current} completed");
})
.Build();
await countingClient.GetResponseAsync("How do I julienne carrots?");
await countingClient.GetResponseAsync("What is blanching?");
await countingClient.GetResponseAsync("How long do I boil eggs for?");
Console.WriteLine($" Total requests made: {requestCount}"); // 3
The sharedFunc overload of .Use() handles both streaming and non-streaming calls. You call await next(...) to invoke the next layer; the result flows through automatically. This is ideal for pre/post-processing logic like counting, logging, or header injection.
Part 6 — Custom DelegatingChatClient
For reusable middleware, create a class that extends DelegatingChatClient. Override GetResponseAsync and GetStreamingResponseAsync to add your logic before and after calling base:
public sealed class TimingChatClient : DelegatingChatClient
{
public TimingChatClient(IChatClient innerClient) : base(innerClient) { }
public override async Task<ChatResponse> GetResponseAsync(
IEnumerable<ChatMessage> messages,
ChatOptions? options = null,
CancellationToken cancellationToken = default)
{
var sw = Stopwatch.StartNew();
var response = await base.GetResponseAsync(messages, options, cancellationToken);
sw.Stop();
Console.WriteLine($" [Timer] Response completed in {sw.ElapsedMilliseconds} ms");
return response;
}
public override async IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
IEnumerable<ChatMessage> messages,
ChatOptions? options = null,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
var sw = Stopwatch.StartNew();
await foreach (var update in base.GetStreamingResponseAsync(
messages, options, cancellationToken))
{
yield return update;
}
sw.Stop();
Console.WriteLine($"\n [Timer] Streaming completed in {sw.ElapsedMilliseconds} ms");
}
}
Creating an Extension Method
An extension method makes the middleware easy to chain like the built-in ones:
public static class TimingExtensions
{
public static ChatClientBuilder UseTiming(this ChatClientBuilder builder)
{
return builder.Use(innerClient => new TimingChatClient(innerClient));
}
}
// Usage
IChatClient timedClient = CreateBaseClient()
.AsBuilder()
.UseTiming()
.Build();
Part 7 — Combined Middleware Pipeline
The real power comes from stacking multiple middleware layers. Order matters — middleware executes in the order you add it:
IChatClient fullPipeline = CreateBaseClient()
.AsBuilder()
.ConfigureOptions(opts =>
{
opts.Temperature = 0.5f;
opts.MaxOutputTokens = 200;
})
.UseDistributedCache(combinedCache) // Check cache first
.UseTiming() // Measure response time
.Build();
// First call — hits the API
var response = await fullPipeline.GetResponseAsync(
"What are the five French mother sauces?");
// Second call — served instantly from cache
response = await fullPipeline.GetResponseAsync(
"What are the five French mother sauces?");
The second identical request is served from cache and barely registers on the timer, demonstrating how caching and timing work together in the pipeline.
Summary
| Concept | Key Takeaway |
|---|
| ChatClientBuilder | Creates middleware pipelines with .AsBuilder() … .Build(). |
| UseDistributedCache | Caches identical prompts to reduce API calls and costs. |
| UseLogging | Records requests and responses through ILoggerFactory. |
| ConfigureOptions | Sets default Temperature, MaxOutputTokens, etc. for all requests. |
| Inline Middleware | .Use() with a delegate for lightweight cross-cutting concerns. |
| DelegatingChatClient | Full class for reusable, testable custom middleware. |
| Execution Order | Middleware runs in the order it is added; responses travel back in reverse. |
Full Example
using System.Diagnostics;
using System.Runtime.CompilerServices;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using OpenAI;
namespace MicrosoftAgentFrameworkLesson.ConsoleApp;
// ── Custom Middleware: Request Timing ─────────────────────
public sealed class TimingChatClient : DelegatingChatClient
{
public TimingChatClient(IChatClient innerClient) : base(innerClient) { }
public override async Task<ChatResponse> GetResponseAsync(
IEnumerable<ChatMessage> messages,
ChatOptions? options = null,
CancellationToken cancellationToken = default)
{
var sw = Stopwatch.StartNew();
var response = await base.GetResponseAsync(messages, options, cancellationToken);
sw.Stop();
Console.WriteLine($" [Timer] Response completed in {sw.ElapsedMilliseconds} ms");
return response;
}
public override async IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
IEnumerable<ChatMessage> messages,
ChatOptions? options = null,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
var sw = Stopwatch.StartNew();
await foreach (var update in base.GetStreamingResponseAsync(
messages, options, cancellationToken))
{
yield return update;
}
sw.Stop();
Console.WriteLine($"\n [Timer] Streaming completed in {sw.ElapsedMilliseconds} ms");
}
}
// ── Extension method for the custom middleware ────────────
public static class TimingExtensions
{
public static ChatClientBuilder UseTiming(this ChatClientBuilder builder)
{
return builder.Use(innerClient => new TimingChatClient(innerClient));
}
}
// ── Demo class ───────────────────────────────────────────
public static class MiddlewarePipelineDemo
{
public static async Task RunAsync()
{
// 1. Retrieve API key
var apiKey = Environment.GetEnvironmentVariable("OPEN_AI_KEY");
if (string.IsNullOrWhiteSpace(apiKey))
{
Console.WriteLine("Please set the OPEN_AI_KEY environment variable.");
return;
}
// Base client factory — reused across parts
IChatClient CreateBaseClient() => new OpenAIClient(apiKey)
.GetChatClient("gpt-4o-mini")
.AsIChatClient();
// ══════════════════════════════════════════════
// PART 1 — ChatClientBuilder Pattern
// ══════════════════════════════════════════════
Console.WriteLine("═══ Part 1: ChatClientBuilder Pattern ═══\n");
IChatClient simpleClient = CreateBaseClient()
.AsBuilder() // Start building the pipeline
.Build(); // No middleware — passes straight through
var response = await simpleClient.GetResponseAsync(
"Give me a one-sentence description of how to make pancakes.");
Console.WriteLine(response.Text);
// ══════════════════════════════════════════════
// PART 2 — Caching Responses
// ══════════════════════════════════════════════
Console.WriteLine("\n═══ Part 2: Caching Responses ═══\n");
var cache = new MemoryDistributedCache(
Options.Create(new MemoryDistributedCacheOptions()));
IChatClient cachedClient = CreateBaseClient()
.AsBuilder()
.UseDistributedCache(cache)
.Build();
// Same prompt sent three times — second and third hit the cache
string[] prompts =
[
"Name three Italian pasta shapes.",
"What spice gives curry its yellow color?",
"Name three Italian pasta shapes." // duplicate — cached
];
foreach (var prompt in prompts)
{
var sw = Stopwatch.StartNew();
var cached = await cachedClient.GetResponseAsync(prompt);
sw.Stop();
Console.WriteLine($" [{sw.ElapsedMilliseconds,5} ms] {prompt}");
Console.WriteLine($" → {cached.Text}\n");
}
// ══════════════════════════════════════════════
// PART 3 — Logging Middleware
// ══════════════════════════════════════════════
Console.WriteLine("═══ Part 3: Logging Middleware ═══\n");
var loggerFactory = LoggerFactory.Create(builder =>
builder.AddConsole().SetMinimumLevel(LogLevel.Debug));
IChatClient loggedClient = CreateBaseClient()
.AsBuilder()
.UseLogging(loggerFactory)
.Build();
response = await loggedClient.GetResponseAsync(
"What temperature should I bake sourdough bread at?");
Console.WriteLine($"\n Answer: {response.Text}");
// ══════════════════════════════════════════════
// PART 4 — Configuring Default Options
// ══════════════════════════════════════════════
Console.WriteLine("\n═══ Part 4: ConfigureOptions — Defaults ═══\n");
IChatClient configuredClient = CreateBaseClient()
.AsBuilder()
.ConfigureOptions(options =>
{
options.Temperature = 0.2f; // low creativity — factual answers
options.MaxOutputTokens = 150; // keep answers short
})
.Build();
// Uses the defaults (low temperature, 150 tokens)
response = await configuredClient.GetResponseAsync(
"List five essential kitchen tools for a beginner cook.");
Console.WriteLine($" Default (T=0.2, 150 tokens):\n {response.Text}\n");
// Override for a specific request — high creativity for a recipe idea
response = await configuredClient.GetResponseAsync(
"Invent a creative fusion dessert combining Japanese and Mexican cuisine.",
new ChatOptions { Temperature = 1.0f, MaxOutputTokens = 300 });
Console.WriteLine($" Override (T=1.0, 300 tokens):\n {response.Text}");
// ══════════════════════════════════════════════
// PART 5 — Inline Custom Middleware
// ══════════════════════════════════════════════
Console.WriteLine("\n═══ Part 5: Inline Custom Middleware ═══\n");
var requestCount = 0;
IChatClient countingClient = CreateBaseClient()
.AsBuilder()
.Use(async (messages, options, next, cancellation) =>
{
var current = Interlocked.Increment(ref requestCount);
Console.WriteLine($" [Counter] Request #{current} — " +
$"{messages.Count()} message(s)");
await next(messages, options, cancellation);
Console.WriteLine($" [Counter] Request #{current} completed");
})
.Build();
await countingClient.GetResponseAsync("How do I julienne carrots?");
await countingClient.GetResponseAsync("What is blanching?");
await countingClient.GetResponseAsync("How long do I boil eggs for?");
Console.WriteLine($"\n Total requests made: {requestCount}");
// ══════════════════════════════════════════════
// PART 6 — Custom DelegatingChatClient
// ══════════════════════════════════════════════
Console.WriteLine("\n═══ Part 6: Custom DelegatingChatClient (Timing) ═══\n");
IChatClient timedClient = CreateBaseClient()
.AsBuilder()
.UseTiming() // Our custom extension method
.Build();
response = await timedClient.GetResponseAsync(
"Explain the Maillard reaction in cooking in two sentences.");
Console.WriteLine($" Answer: {response.Text}\n");
// Streaming also gets timed
Console.Write(" Streaming: ");
await foreach (var update in timedClient.GetStreamingResponseAsync(
"Give me a quick recipe for garlic butter shrimp."))
{
Console.Write(update.Text);
}
// ══════════════════════════════════════════════
// PART 7 — Combined Pipeline
// ══════════════════════════════════════════════
Console.WriteLine("\n\n═══ Part 7: Combined Middleware Pipeline ═══\n");
var combinedCache = new MemoryDistributedCache(
Options.Create(new MemoryDistributedCacheOptions()));
IChatClient fullPipeline = CreateBaseClient()
.AsBuilder()
.ConfigureOptions(opts =>
{
opts.Temperature = 0.5f;
opts.MaxOutputTokens = 200;
})
.UseDistributedCache(combinedCache) // Check cache first
.UseTiming() // Measure response time
.Build();
// Request → ConfigureOptions → Cache Check → Timer → AI Model
Console.WriteLine(" First call (hits the API):");
response = await fullPipeline.GetResponseAsync(
"What are the five French mother sauces?");
Console.WriteLine($" {response.Text}\n");
Console.WriteLine(" Second call (same prompt — served from cache):");
response = await fullPipeline.GetResponseAsync(
"What are the five French mother sauces?");
Console.WriteLine($" {response.Text}");
}
}