Microsoft Agent Framework Concept ( Chapter-2) Created: 20 Apr 2026 Updated: 20 Apr 2026

Embeddings: The Language of Meaning for LLMs

Large language models don't understand raw words, images, or audio — they only understand numbers. An embedding is the bridge between the two worlds: it is a numeric representation of non-numeric data (text, images, audio) that preserves semantic meaning. Two pieces of content with similar meaning end up with similar embedding vectors, even if they don't share any exact words.

Embeddings are the mechanism that lets an LLM compare concepts, find related documents, summarize text, translate between languages, and ground answers in your own private data. They are also what makes vector databases and retrieval-augmented generation (RAG) possible.

1. What Are Embeddings?

An embedding is a long array of floating-point numbers — a vector — produced by an embedding model. The model reads raw data such as a sentence or a paragraph and outputs a fixed-length list of numbers that encodes its meaning in a high-dimensional space.

For example, the OpenAI model text-embedding-3-small outputs a vector with 1,536 dimensions. Each dimension is a single float, so each piece of text is represented by roughly 1,536 numbers.

Raw text: "Elusive big cat with thick smoky-grey fur..."
Embedding: [0.013, -0.047, 0.091, -0.006, 0.028, ..., 0.034] // 1536 floats

You cannot read meaning from individual numbers — meaning only emerges from the relationships between whole vectors.

2. Capturing Semantic Relationships

Embedding models are trained on huge amounts of text, so concepts that appear in similar contexts end up close together in vector space. Think of the space as a map where every idea has a location.

  1. "Snow Leopard" and "Arctic Fox" land near each other — both are cold-climate predators.
  2. "Mango Tree" and "Pineapple Plant" cluster together — both are tropical fruit plants.
  3. "Snow Leopard" and "Mango Tree" land far apart — they share almost no semantic context.

The distance between two vectors (typically measured with cosine similarity) is what "similar meaning" looks like mathematically. Cosine similarity returns a value between -1 and 1, where 1 means identical direction (very similar meaning) and 0 means unrelated.

3. Practical Applications

Use CaseHow Embeddings Help
Semantic SearchFind documents by meaning, not by matching keywords.
Text SummarizationIdentify which chunks of a long document carry the most core meaning, and keep only those.
ClassificationGroup texts (spam vs ham, positive vs negative, topic A vs topic B) by their vector positions.
Text-to-ImageTranslate a text description into a visual concept that an image model can render.
RecommendationsSuggest similar products, songs, or articles based on vector closeness.
Retrieval-Augmented Generation (RAG)Fetch the most relevant chunks of your own data and inject them into an LLM prompt.

4. Semantic Memory and Vector Databases

Embeddings are usually pre-computed once and stored. A vector database (also called a vector store) is a database that is optimized to index these high-dimensional vectors and run nearest-neighbor searches quickly across millions or billions of entries.

A typical vector-search workflow looks like this:

  1. Generate an embedding for every record in your data set using an embedding model.
  2. Store the vectors — together with the original records — inside a vector database.
  3. When a user asks a question, convert that question into an embedding using the same model.
  4. Ask the vector database for the records whose vectors are closest to the query vector.
  5. Feed those records back into an LLM as grounded context to produce the final answer.

This is precisely what gives an LLM "long-term memory" about things it was never trained on — your internal docs, your product catalog, your support tickets.

Embeddings in .NET

The Microsoft.Extensions.AI library provides an abstraction called IEmbeddingGenerator<string, Embedding<float>> that represents any embedding model. Combined with Microsoft.Extensions.VectorData and a connector like Microsoft.SemanticKernel.Connectors.InMemory, you can build a working semantic search pipeline with a handful of lines.

Install the packages:

dotnet add package Microsoft.Extensions.AI.OpenAI
dotnet add package Microsoft.SemanticKernel.Connectors.InMemory

Key Building Blocks

Type / MethodPurpose
IEmbeddingGenerator<string, Embedding<float>>Abstraction over any embedding model (OpenAI, Azure OpenAI, local, etc.).
GenerateAsync(IEnumerable<string>)Produces embedding vectors for a batch of strings in one call.
GenerateVectorAsync(string)Convenience helper for embedding a single string and getting the raw vector.
[VectorStoreKey] / [VectorStoreData] / [VectorStoreVector(n)]Attributes that describe how a record maps into a vector store.
InMemoryVectorStoreA lightweight in-process vector store — perfect for demos and tests.
collection.SearchAsync(vector, top: N)Returns the top-N records whose vectors are closest to the query vector.

Full Example

The following demo builds a tiny wildlife sanctuary field guide. It takes eight entries — four animals and four fruit plants — turns every field note into a numeric vector, shows that animals cluster away from fruits in vector space, stores everything in an in-memory vector database, and runs natural-language semantic searches against it.

using Microsoft.Extensions.AI;
using Microsoft.Extensions.VectorData;
using Microsoft.SemanticKernel.Connectors.InMemory;
using OpenAI;

namespace MicrosoftAgentFrameworkLesson.ConsoleApp;

// ── Vector-store record for a wildlife species ─────────────

public class SpeciesRecord
{
[VectorStoreKey]
public int Id { get; set; }

[VectorStoreData]
public string CommonName { get; set; } = string.Empty;

[VectorStoreData]
public string Habitat { get; set; } = string.Empty;

[VectorStoreData]
public string FieldNote { get; set; } = string.Empty;

[VectorStoreVector(1536)]
public ReadOnlyMemory<float> Embedding { get; set; }
}

// ── Demo class ────────────────────────────────────────────

public static class EmbeddingsLanguageOfMeaningDemo
{
// A wildlife sanctuary field guide — raw non-numeric text about animals and fruits
private static readonly (int Id, string CommonName, string Habitat, string FieldNote)[] SpeciesCatalog =
[
(1, "Snow Leopard", "Himalayan Mountains",
"Elusive big cat with thick smoky-grey fur that stalks blue sheep across cold alpine cliffs at high altitude."),
(2, "Arctic Fox", "Tundra",
"Small fluffy carnivore whose white winter coat helps it hunt lemmings and voles on the frozen polar plains."),
(3, "Bengal Tiger", "Mangrove Forests",
"Massive striped predator that swims between muddy tidal channels and ambushes deer in humid subtropical jungle."),
(4, "Giant Panda", "Bamboo Forest",
"Black-and-white bear that spends most of the day chewing bamboo shoots on cool misty mountain slopes in China."),
(5, "Mango Tree", "Tropical Orchards",
"Evergreen fruit tree that produces sweet orange pulpy drupes rich in vitamin C during the hot monsoon season."),
(6, "Pineapple Plant", "Tropical Plantations",
"Low spiky bromeliad that bears a single golden fruit with tough diamond-patterned skin and tangy juicy flesh."),
(7, "Banana Palm", "Humid Lowlands",
"Fast-growing herbaceous plant whose drooping bunches of soft yellow fruit ripen over many weeks in warm climates."),
(8, "Strawberry Patch", "Temperate Gardens",
"Low creeping plant that produces small heart-shaped red berries with tiny yellow seeds on their shiny surface."),
];

public static async Task RunAsync()
{
var apiKey = Environment.GetEnvironmentVariable("OPEN_AI_KEY");
if (string.IsNullOrWhiteSpace(apiKey))
{
Console.WriteLine("Please set the OPEN_AI_KEY environment variable.");
return;
}

// Embedding generator — turns raw text into numeric vectors
IEmbeddingGenerator<string, Embedding<float>> embeddingGenerator =
new OpenAIClient(apiKey)
.GetEmbeddingClient("text-embedding-3-small")
.AsIEmbeddingGenerator();

// ══════════════════════════════════════════════
// PART 1 — Turn Non-Numeric Text Into Vectors
// ══════════════════════════════════════════════
Console.WriteLine("═══ Part 1: Text → Numeric Vectors ═══\n");

var fieldNotes = SpeciesCatalog.Select(s => s.FieldNote).ToArray();
var vectors = await embeddingGenerator.GenerateAsync(fieldNotes);

Console.WriteLine($"Generated {vectors.Count} embedding vectors.");
Console.WriteLine($"Each vector has {vectors[0].Vector.Length} dimensions.");
Console.WriteLine("First 6 numbers of the Snow Leopard vector:");
var preview = vectors[0].Vector.Span;
Console.Write(" [");
for (int i = 0; i < 6; i++) Console.Write($"{preview[i]:F4}{(i < 5 ? ", " : "")}");
Console.WriteLine(", ...]\n");

// ══════════════════════════════════════════════
// PART 2 — Semantic Relationships (Animals vs Fruits)
// ══════════════════════════════════════════════
Console.WriteLine("═══ Part 2: Semantic Distance Between Entries ═══\n");

// Pick three reference entries: 2 animals + 1 fruit
var snowLeopard = vectors[0].Vector; // Snow Leopard
var arcticFox = vectors[1].Vector; // Arctic Fox
var mangoTree = vectors[4].Vector; // Mango Tree

float animalsSim = CosineSimilarity(snowLeopard, arcticFox);
float crossSim = CosineSimilarity(snowLeopard, mangoTree);

Console.WriteLine($"Similarity (Snow Leopard ↔ Arctic Fox) : {animalsSim:F3}");
Console.WriteLine($"Similarity (Snow Leopard ↔ Mango Tree) : {crossSim:F3}");
Console.WriteLine(animalsSim > crossSim
? "→ Animal vectors cluster closer together than the animal-vs-fruit pair.\n"
: "→ Unexpected: cross-category pair was closer.\n");

// ══════════════════════════════════════════════
// PART 3 — Store Vectors in an In-Memory Vector DB
// ══════════════════════════════════════════════
Console.WriteLine("═══ Part 3: Long-Term Semantic Memory ═══\n");

var vectorStore = new InMemoryVectorStore();
var collection = vectorStore.GetCollection<int, SpeciesRecord>("species");
await collection.EnsureCollectionExistsAsync();

for (int i = 0; i < SpeciesCatalog.Length; i++)
{
var (id, common, habitat, note) = SpeciesCatalog[i];
await collection.UpsertAsync(new SpeciesRecord
{
Id = id,
CommonName = common,
Habitat = habitat,
FieldNote = note,
Embedding = vectors[i].Vector,
});
}

Console.WriteLine($"Indexed {SpeciesCatalog.Length} species in the vector store.\n");

// ══════════════════════════════════════════════
// PART 4 — Semantic Search Over the Field Guide
// ══════════════════════════════════════════════
Console.WriteLine("═══ Part 4: Semantic Search ═══\n");

string[] naturalLanguageQueries =
[
"a predator that lives where it is very cold",
"a sweet tropical fruit you can peel",
"a camouflaged hunter in freezing terrain",
];

foreach (var query in naturalLanguageQueries)
{
var queryVector = await embeddingGenerator.GenerateVectorAsync(query);
var hits = collection.SearchAsync(queryVector, top: 2);

Console.WriteLine($"Query: \"{query}\"");
await foreach (var hit in hits)
{
Console.WriteLine($" [{hit.Score:F3}] {hit.Record.CommonName} ({hit.Record.Habitat})");
}
Console.WriteLine();
}
}

// ── Cosine similarity helper ──────────────────────────

private static float CosineSimilarity(ReadOnlyMemory<float> a, ReadOnlyMemory<float> b)
{
var spanA = a.Span;
var spanB = b.Span;

float dot = 0, normA = 0, normB = 0;
for (int i = 0; i < spanA.Length; i++)
{
dot += spanA[i] * spanB[i];
normA += spanA[i] * spanA[i];
normB += spanB[i] * spanB[i];
}

return dot / (MathF.Sqrt(normA) * MathF.Sqrt(normB));
}
}

Key Takeaways

  1. Embeddings are numeric vectors that encode the meaning of non-numeric data.
  2. Similar concepts produce similar vectors — similarity is measured with cosine similarity.
  3. They power semantic search, classification, summarization, recommendations, text-to-image, and RAG.
  4. Vectors are stored in vector databases to give LLMs long-term semantic memory.
  5. In .NET, IEmbeddingGenerator<string, Embedding<float>> plus a vector-store connector is enough to build a full pipeline.

Reference

Embeddings in .NET — Microsoft Learn


Share this lesson: