Microsoft Agent Framework
Microsoft.Extensions.AI
Created: 27 Feb 2026
Updated: 27 Feb 2026
IChatClient: Cache Responses
DistributedCachingChatClient wraps any IChatClient and caches responses keyed on the conversation history. When an identical prompt is sent again, the cached answer is returned immediately — no API call is made. This saves cost, reduces latency, and is ideal for FAQ-style applications.
Key Concepts
1. Adding the Cache Layer
Use ChatClientBuilder.UseDistributedCache and pass any IDistributedCache implementation. For local/dev use, the in-memory cache from Microsoft.Extensions.Caching.Memory works well:
IDistributedCache cache = new MemoryDistributedCache(
Options.Create(new MemoryDistributedCacheOptions()));
IChatClient client = new ChatClientBuilder(
new OpenAIClient(apiKey).GetChatClient("gpt-4o-mini").AsIChatClient())
.UseDistributedCache(cache)
.Build();
2. Cache Hit vs. Miss
The first call with a given prompt hits the API and stores the response. Every subsequent call with the same prompt is served from cache. You can observe this with elapsed time — cache hits take near 0 ms:
var sw = Stopwatch.StartNew();
ChatResponse response = await client.GetResponseAsync("What is the default port for HTTPS?");
sw.Stop();
Console.WriteLine($"[{sw.ElapsedMilliseconds} ms] {response.Text}");
3. Required Package
<PackageReference Include="Microsoft.Extensions.Caching.Memory" Version="9.0.2" />
Full Example
using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
using Microsoft.Extensions.Options;
using OpenAI;
using System.Diagnostics;
namespace MicrosoftAgentFrameworkLesson.ConsoleApp.ChatClient;
/// <summary>
/// Demonstrates DistributedCachingChatClient with in-memory cache.
/// Repeated identical prompts are served from cache — no API call made.
/// Scenario: IT FAQ where users often ask the same questions.
/// </summary>
public static class CachingDemo
{
public static async Task RunAsync()
{
var apiKey = Environment.GetEnvironmentVariable("OPEN_AI_KEY")
?? throw new InvalidOperationException("Set OPEN_AI_KEY environment variable.");
IDistributedCache cache = new MemoryDistributedCache(
Options.Create(new MemoryDistributedCacheOptions()));
IChatClient client = new ChatClientBuilder(
new OpenAIClient(apiKey).GetChatClient("gpt-4o-mini").AsIChatClient())
.UseDistributedCache(cache)
.Build();
Console.WriteLine("====== IChatClient — Cache Responses ======\n");
Console.WriteLine("Duplicate prompts are served from cache (much faster).\n");
string[] questions =
[
"What is the default port for HTTPS?",
"What is the default port for HTTPS?", // cache hit
"What does DNS stand for?",
"What does DNS stand for?", // cache hit
"What is the default port for HTTPS?" // cache hit again
];
foreach (var q in questions)
{
var sw = Stopwatch.StartNew();
ChatResponse response = await client.GetResponseAsync(q);
sw.Stop();
Console.WriteLine($"Q: {q}");
Console.WriteLine($"A: {response.Text}");
Console.WriteLine($" [{sw.ElapsedMilliseconds} ms]");
Console.WriteLine();
}
}
}