Microsoft Agent Framework Microsoft.Extensions.AI Created: 27 Feb 2026 Updated: 27 Feb 2026

IEmbeddingGenerator: Caching Pipeline

Just like IChatClient, IEmbeddingGenerator supports layered middleware pipelines via EmbeddingGeneratorBuilder. Adding UseDistributedCache caches each embedding keyed on the input text. Identical inputs skip the API call entirely and return the cached vector — saving cost and reducing latency significantly.

Key Concepts

1. EmbeddingGeneratorBuilder

Works the same way as ChatClientBuilder. Pass the inner generator to the constructor, chain Use* methods, and call Build():

IEmbeddingGenerator<string, Embedding<float>> generator =
new EmbeddingGeneratorBuilder<string, Embedding<float>>(innerGenerator)
.UseDistributedCache(cache)
.Build();

2. Cache Key

The cache key is derived from the input text. If the same string is embedded twice, the second call is served from cache. Different strings always result in a fresh API call.

3. Choosing a Cache Backend

Any IDistributedCache implementation works:

  1. MemoryDistributedCache — local in-process, no setup, dev/test only.
  2. Redis — shared across multiple processes/instances, production use.
  3. SQL Server, Cosmos DB — persistent, survives restarts.
IDistributedCache cache = new MemoryDistributedCache(
Options.Create(new MemoryDistributedCacheOptions()));

Full Example

using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
using Microsoft.Extensions.Options;
using OpenAI;
using System.Diagnostics;

namespace MicrosoftAgentFrameworkLesson.ConsoleApp.Embeddings;

/// <summary>
/// Demonstrates EmbeddingGeneratorBuilder with DistributedCache middleware.
/// Repeated inputs are served from cache — no API call made.
/// Scenario: Document processing pipeline where the same phrases appear often.
/// </summary>
public static class EmbeddingCachingDemo
{
public static async Task RunAsync()
{
var apiKey = Environment.GetEnvironmentVariable("OPEN_AI_KEY")
?? throw new InvalidOperationException("Set OPEN_AI_KEY environment variable.");

IDistributedCache cache = new MemoryDistributedCache(
Options.Create(new MemoryDistributedCacheOptions()));

IEmbeddingGenerator<string, Embedding<float>> generator =
new EmbeddingGeneratorBuilder<string, Embedding<float>>(
new OpenAIClient(apiKey)
.GetEmbeddingClient("text-embedding-3-small")
.AsIEmbeddingGenerator())
.UseDistributedCache(cache)
.Build();

Console.WriteLine("====== IEmbeddingGenerator — Caching Pipeline ======\n");
Console.WriteLine("Duplicate inputs are served from cache (much faster).\n");

string[] inputs =
[
"Return policy for online purchases",
"Shipping times for international orders",
"Return policy for online purchases", // cache hit
"Warranty coverage for electronics",
"Shipping times for international orders" // cache hit
];

foreach (var input in inputs)
{
var sw = Stopwatch.StartNew();
GeneratedEmbeddings<Embedding<float>> result =
await generator.GenerateAsync([input]);
sw.Stop();

Console.WriteLine($"Input : \"{input}\"");
Console.WriteLine($" Dims : {result[0].Vector.Length} [{sw.ElapsedMilliseconds} ms]");
Console.WriteLine();
}
}
}
Share this lesson: