Generative AI for Beginners Generative AI Techniques Created: 26 Mar 2026 Updated: 26 Mar 2026

Middleware Pipelines with Microsoft.Extensions.AI

Production AI applications need more than basic model calls. They require caching to reduce costs, logging to debug issues, telemetry to monitor performance, and often custom behaviours like rate limiting or request counting. Microsoft.Extensions.AI solves this with a middleware pipeline pattern — the same concept used in ASP.NET Core.

How the Pipeline Works

Each middleware layer wraps the next one. When a request arrives it passes through every layer in order; the response travels back through the same layers in reverse:

Request → ConfigureOptions → Cache Check → Timer → AI Model

Response ← ConfigureOptions ← Cache Store ← Timer ← AI Model

If the cache already has the answer, the request never reaches the AI model.

Part 1 — ChatClientBuilder Pattern

Every pipeline starts with .AsBuilder() and ends with .Build(). Between those two calls you chain Use* methods to add middleware layers:

IChatClient simpleClient = new OpenAIClient(apiKey)

.GetChatClient("gpt-4o-mini")

.AsIChatClient()

.AsBuilder() // Start building the pipeline

.Build(); // No middleware — passes straight through

var response = await simpleClient.GetResponseAsync(

"Give me a one-sentence description of how to make pancakes.");

Console.WriteLine(response.Text);

Even without middleware the builder pattern works; it simply returns the inner client unchanged. Every Use* method adds a layer that wraps the client beneath it.

Part 2 — Caching Responses

UseDistributedCache stores responses for identical prompts. The second time you ask the same question the answer returns instantly from memory:

var cache = new MemoryDistributedCache(

Options.Create(new MemoryDistributedCacheOptions()));

IChatClient cachedClient = CreateBaseClient()

.AsBuilder()

.UseDistributedCache(cache)

.Build();

string[] prompts =

[

"Name three Italian pasta shapes.",

"What spice gives curry its yellow color?",

"Name three Italian pasta shapes." // duplicate — cached

];

foreach (var prompt in prompts)

{

var sw = Stopwatch.StartNew();

var cached = await cachedClient.GetResponseAsync(prompt);

sw.Stop();

Console.WriteLine($" [{sw.ElapsedMilliseconds,5} ms] {prompt}");

Console.WriteLine($" → {cached.Text}");

}

The third prompt “Name three Italian pasta shapes.” returns in under 1 ms because it is served from the in-memory cache. In production you would substitute a distributed cache such as Redis.

Part 3 — Logging Middleware

UseLogging records every request and response through the standard ILoggerFactory. This helps with debugging and auditing:

var loggerFactory = LoggerFactory.Create(builder =>

builder.AddConsole().SetMinimumLevel(LogLevel.Debug));

IChatClient loggedClient = CreateBaseClient()

.AsBuilder()

.UseLogging(loggerFactory)

.Build();

var response = await loggedClient.GetResponseAsync(

"What temperature should I bake sourdough bread at?");

Console.WriteLine(response.Text);

The console shows structured log entries including the prompt text, token counts, and timing information — all handled automatically by the middleware.

Part 4 — Configuring Default Options

ConfigureOptions sets defaults that apply to every request made through the client. Individual requests can still override them:

IChatClient configuredClient = CreateBaseClient()

.AsBuilder()

.ConfigureOptions(options =>

{

options.Temperature = 0.2f; // low creativity — factual answers

options.MaxOutputTokens = 150; // keep answers short

})

.Build();

// Uses the defaults (low temperature, 150 tokens)

var response = await configuredClient.GetResponseAsync(

"List five essential kitchen tools for a beginner cook.");

// Override for a specific request — high creativity

var creative = await configuredClient.GetResponseAsync(

"Invent a creative fusion dessert combining Japanese and Mexican cuisine.",

new ChatOptions { Temperature = 1.0f, MaxOutputTokens = 300 });

This is useful for setting organisation-wide guardrails while still allowing per-request flexibility.

Part 5 — Inline Custom Middleware

For simple cross-cutting concerns you can use the .Use() method with a delegate. The delegate receives the messages, options, a next function to call the next layer, and a cancellation token:

var requestCount = 0;

IChatClient countingClient = CreateBaseClient()

.AsBuilder()

.Use(async (messages, options, next, cancellation) =>

{

var current = Interlocked.Increment(ref requestCount);

Console.WriteLine($" [Counter] Request #{current}");

await next(messages, options, cancellation);

Console.WriteLine($" [Counter] Request #{current} completed");

})

.Build();

await countingClient.GetResponseAsync("How do I julienne carrots?");

await countingClient.GetResponseAsync("What is blanching?");

await countingClient.GetResponseAsync("How long do I boil eggs for?");

Console.WriteLine($" Total requests made: {requestCount}"); // 3

The sharedFunc overload of .Use() handles both streaming and non-streaming calls. You call await next(...) to invoke the next layer; the result flows through automatically. This is ideal for pre/post-processing logic like counting, logging, or header injection.

Part 6 — Custom DelegatingChatClient

For reusable middleware, create a class that extends DelegatingChatClient. Override GetResponseAsync and GetStreamingResponseAsync to add your logic before and after calling base:

public sealed class TimingChatClient : DelegatingChatClient

{

public TimingChatClient(IChatClient innerClient) : base(innerClient) { }

public override async Task<ChatResponse> GetResponseAsync(

IEnumerable<ChatMessage> messages,

ChatOptions? options = null,

CancellationToken cancellationToken = default)

{

var sw = Stopwatch.StartNew();

var response = await base.GetResponseAsync(messages, options, cancellationToken);

sw.Stop();

Console.WriteLine($" [Timer] Response completed in {sw.ElapsedMilliseconds} ms");

return response;

}

public override async IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(

IEnumerable<ChatMessage> messages,

ChatOptions? options = null,

[EnumeratorCancellation] CancellationToken cancellationToken = default)

{

var sw = Stopwatch.StartNew();

await foreach (var update in base.GetStreamingResponseAsync(

messages, options, cancellationToken))

{

yield return update;

}

sw.Stop();

Console.WriteLine($"\n [Timer] Streaming completed in {sw.ElapsedMilliseconds} ms");

}

Creating an Extension Method

An extension method makes the middleware easy to chain like the built-in ones:

public static class TimingExtensions

{

public static ChatClientBuilder UseTiming(this ChatClientBuilder builder)

{

return builder.Use(innerClient => new TimingChatClient(innerClient));

}

// Usage

IChatClient timedClient = CreateBaseClient()

.AsBuilder()

.UseTiming()

.Build();

Part 7 — Combined Middleware Pipeline

The real power comes from stacking multiple middleware layers. Order matters — middleware executes in the order you add it:

IChatClient fullPipeline = CreateBaseClient()

.AsBuilder()

.ConfigureOptions(opts =>

{

opts.Temperature = 0.5f;

opts.MaxOutputTokens = 200;

})

.UseDistributedCache(combinedCache) // Check cache first

.UseTiming() // Measure response time

.Build();

// First call — hits the API

var response = await fullPipeline.GetResponseAsync(

"What are the five French mother sauces?");

// Second call — served instantly from cache

response = await fullPipeline.GetResponseAsync(

"What are the five French mother sauces?");

The second identical request is served from cache and barely registers on the timer, demonstrating how caching and timing work together in the pipeline.

Summary

Concept	Key Takeaway
ChatClientBuilder	Creates middleware pipelines with `.AsBuilder()` … `.Build()`.
UseDistributedCache	Caches identical prompts to reduce API calls and costs.
UseLogging	Records requests and responses through `ILoggerFactory`.
ConfigureOptions	Sets default Temperature, MaxOutputTokens, etc. for all requests.
Inline Middleware	`.Use()` with a delegate for lightweight cross-cutting concerns.
DelegatingChatClient	Full class for reusable, testable custom middleware.
Execution Order	Middleware runs in the order it is added; responses travel back in reverse.

Full Example

using System.Diagnostics;

using System.Runtime.CompilerServices;

using Microsoft.Extensions.AI;

using Microsoft.Extensions.Caching.Distributed;

using Microsoft.Extensions.Caching.Memory;

using Microsoft.Extensions.Logging;

using Microsoft.Extensions.Options;

using OpenAI;

namespace MicrosoftAgentFrameworkLesson.ConsoleApp;

// ── Custom Middleware: Request Timing ─────────────────────

public sealed class TimingChatClient : DelegatingChatClient

{

public TimingChatClient(IChatClient innerClient) : base(innerClient) { }

public override async Task<ChatResponse> GetResponseAsync(

IEnumerable<ChatMessage> messages,

ChatOptions? options = null,

CancellationToken cancellationToken = default)

{

var sw = Stopwatch.StartNew();

var response = await base.GetResponseAsync(messages, options, cancellationToken);

sw.Stop();

Console.WriteLine($" [Timer] Response completed in {sw.ElapsedMilliseconds} ms");

return response;

}

public override async IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(

IEnumerable<ChatMessage> messages,

ChatOptions? options = null,

[EnumeratorCancellation] CancellationToken cancellationToken = default)

{

var sw = Stopwatch.StartNew();

await foreach (var update in base.GetStreamingResponseAsync(

messages, options, cancellationToken))

{

yield return update;

}

sw.Stop();

Console.WriteLine($"\n [Timer] Streaming completed in {sw.ElapsedMilliseconds} ms");

}

// ── Extension method for the custom middleware ────────────

public static class TimingExtensions

{

public static ChatClientBuilder UseTiming(this ChatClientBuilder builder)

{

return builder.Use(innerClient => new TimingChatClient(innerClient));

}

// ── Demo class ───────────────────────────────────────────

public static class MiddlewarePipelineDemo

{

public static async Task RunAsync()

{

// 1. Retrieve API key

var apiKey = Environment.GetEnvironmentVariable("OPEN_AI_KEY");

if (string.IsNullOrWhiteSpace(apiKey))

{

Console.WriteLine("Please set the OPEN_AI_KEY environment variable.");

return;

}

// Base client factory — reused across parts

IChatClient CreateBaseClient() => new OpenAIClient(apiKey)

.GetChatClient("gpt-4o-mini")

.AsIChatClient();

// ══════════════════════════════════════════════

// PART 1 — ChatClientBuilder Pattern

// ══════════════════════════════════════════════

Console.WriteLine("═══ Part 1: ChatClientBuilder Pattern ═══\n");

IChatClient simpleClient = CreateBaseClient()

.AsBuilder() // Start building the pipeline

.Build(); // No middleware — passes straight through

var response = await simpleClient.GetResponseAsync(

"Give me a one-sentence description of how to make pancakes.");

Console.WriteLine(response.Text);

// ══════════════════════════════════════════════

// PART 2 — Caching Responses

// ══════════════════════════════════════════════

Console.WriteLine("\n═══ Part 2: Caching Responses ═══\n");

var cache = new MemoryDistributedCache(

Options.Create(new MemoryDistributedCacheOptions()));

IChatClient cachedClient = CreateBaseClient()

.AsBuilder()

.UseDistributedCache(cache)

.Build();

// Same prompt sent three times — second and third hit the cache

string[] prompts =

[

"Name three Italian pasta shapes.",

"What spice gives curry its yellow color?",

"Name three Italian pasta shapes." // duplicate — cached

];

foreach (var prompt in prompts)

{

var sw = Stopwatch.StartNew();

var cached = await cachedClient.GetResponseAsync(prompt);

sw.Stop();

Console.WriteLine($" [{sw.ElapsedMilliseconds,5} ms] {prompt}");

Console.WriteLine($" → {cached.Text}\n");

}

// ══════════════════════════════════════════════

// PART 3 — Logging Middleware

// ══════════════════════════════════════════════

Console.WriteLine("═══ Part 3: Logging Middleware ═══\n");

var loggerFactory = LoggerFactory.Create(builder =>

builder.AddConsole().SetMinimumLevel(LogLevel.Debug));

IChatClient loggedClient = CreateBaseClient()

.AsBuilder()

.UseLogging(loggerFactory)

.Build();

response = await loggedClient.GetResponseAsync(

"What temperature should I bake sourdough bread at?");

Console.WriteLine($"\n Answer: {response.Text}");

// ══════════════════════════════════════════════

// PART 4 — Configuring Default Options

// ══════════════════════════════════════════════

Console.WriteLine("\n═══ Part 4: ConfigureOptions — Defaults ═══\n");

IChatClient configuredClient = CreateBaseClient()

.AsBuilder()

.ConfigureOptions(options =>

{

options.Temperature = 0.2f; // low creativity — factual answers

options.MaxOutputTokens = 150; // keep answers short

})

.Build();

// Uses the defaults (low temperature, 150 tokens)

response = await configuredClient.GetResponseAsync(

"List five essential kitchen tools for a beginner cook.");

Console.WriteLine($" Default (T=0.2, 150 tokens):\n {response.Text}\n");

// Override for a specific request — high creativity for a recipe idea

response = await configuredClient.GetResponseAsync(

"Invent a creative fusion dessert combining Japanese and Mexican cuisine.",

new ChatOptions { Temperature = 1.0f, MaxOutputTokens = 300 });

Console.WriteLine($" Override (T=1.0, 300 tokens):\n {response.Text}");

// ══════════════════════════════════════════════

// PART 5 — Inline Custom Middleware

// ══════════════════════════════════════════════

Console.WriteLine("\n═══ Part 5: Inline Custom Middleware ═══\n");

var requestCount = 0;

IChatClient countingClient = CreateBaseClient()

.AsBuilder()

.Use(async (messages, options, next, cancellation) =>

{

var current = Interlocked.Increment(ref requestCount);

Console.WriteLine($" [Counter] Request #{current} — " +

$"{messages.Count()} message(s)");

await next(messages, options, cancellation);

Console.WriteLine($" [Counter] Request #{current} completed");

})

.Build();

await countingClient.GetResponseAsync("How do I julienne carrots?");

await countingClient.GetResponseAsync("What is blanching?");

await countingClient.GetResponseAsync("How long do I boil eggs for?");

Console.WriteLine($"\n Total requests made: {requestCount}");

// ══════════════════════════════════════════════

// PART 6 — Custom DelegatingChatClient

// ══════════════════════════════════════════════

Console.WriteLine("\n═══ Part 6: Custom DelegatingChatClient (Timing) ═══\n");

IChatClient timedClient = CreateBaseClient()

.AsBuilder()

.UseTiming() // Our custom extension method

.Build();

response = await timedClient.GetResponseAsync(

"Explain the Maillard reaction in cooking in two sentences.");

Console.WriteLine($" Answer: {response.Text}\n");

// Streaming also gets timed

Console.Write(" Streaming: ");

await foreach (var update in timedClient.GetStreamingResponseAsync(

"Give me a quick recipe for garlic butter shrimp."))

{

Console.Write(update.Text);

}

// ══════════════════════════════════════════════

// PART 7 — Combined Pipeline

// ══════════════════════════════════════════════

Console.WriteLine("\n\n═══ Part 7: Combined Middleware Pipeline ═══\n");

var combinedCache = new MemoryDistributedCache(

Options.Create(new MemoryDistributedCacheOptions()));

IChatClient fullPipeline = CreateBaseClient()

.AsBuilder()

.ConfigureOptions(opts =>

{

opts.Temperature = 0.5f;

opts.MaxOutputTokens = 200;

})

.UseDistributedCache(combinedCache) // Check cache first

.UseTiming() // Measure response time

.Build();

// Request → ConfigureOptions → Cache Check → Timer → AI Model

Console.WriteLine(" First call (hits the API):");

response = await fullPipeline.GetResponseAsync(

"What are the five French mother sauces?");

Console.WriteLine($" {response.Text}\n");

Console.WriteLine(" Second call (same prompt — served from cache):");

response = await fullPipeline.GetResponseAsync(

"What are the five French mother sauces?");

Console.WriteLine($" {response.Text}");

}

Share this lesson:

Details

Navigation

Progress 4 / 7

Start 57% Complete

Function Calling with Microsoft.Extensions.AI Embeddings and Semantic Search with Microsoft.Extensions.AI

Statistics

4 Lessons in Generative AI Techniques

2 SubCategories in Generative AI for Beginners

7 Total Lessons in Generative AI for Beginners

dotnetacademy

Middleware Pipelines with Microsoft.Extensions.AI

How the Pipeline Works

Part 1 — ChatClientBuilder Pattern

Part 2 — Caching Responses

Part 3 — Logging Middleware

Part 4 — Configuring Default Options

Part 5 — Inline Custom Middleware

Part 6 — Custom DelegatingChatClient

Creating an Extension Method

Part 7 — Combined Middleware Pipeline

Summary

Full Example

Details

Category

Navigation

Statistics