Generative AI for Beginners AI Patterns and Applications in .NET Created: 04 Apr 2026 Updated: 04 Apr 2026

Retrieval-Augmented Generation (RAG)

In this lesson you will learn how to ground AI responses in your own data so that answers are accurate, current, and relevant to your specific domain — even when that information was never part of the model's training set.

The Knowledge Problem

Language models are trained on a fixed snapshot of the world. They cannot know about:

  1. Your organisation's internal documents and policies
  2. Events that occurred after their training cut-off
  3. Private data they were never trained on
  4. Domain-specific knowledge unique to your business

Retrieval-Augmented Generation (RAG) solves this by fetching relevant documents from a knowledge base at query time and injecting them into the prompt before asking the model to answer.

Part 1: How RAG Works

RAG has two main phases that execute every time a user asks a question.

Phase 1 — Retrieve

The user's question is converted into an embedding vector. That vector is compared against a pre-built index of document embeddings. The most semantically similar documents are returned.

Phase 2 — Augment and Generate

The retrieved documents are inserted into the system prompt as context. The chat model reads the question together with that context and generates a grounded answer.

The RAG Flow

User Question: "How do I cancel my gym membership?"
|
v
1. RETRIEVE
- Embed the question (text-embedding-3-small)
- Search the vector store (top-K nearest neighbours)
- Return matching documents

|
v
2. AUGMENT
- System: "Answer using ONLY the context below"
- Context: [retrieved documents injected here]
- User: original question

|
v
3. GENERATE
- Chat model reads context + question
- Produces a grounded, accurate answer

|
v
Answer: "Members may cancel by submitting a written
request 30 days in advance..."

Part 2: Building a RAG System

Our demo uses a fitness center member portal as its domain — completely separate from the standard customer-support examples you see in tutorials.

Step 1 — Define the Data Model

Decorate a plain C# class with attributes from Microsoft.Extensions.VectorData.

public class FitnessFaq
{
[VectorStoreKey]
public string DocId { get; set; } = string.Empty;

[VectorStoreData]
public string Title { get; set; } = string.Empty;

[VectorStoreData]
public string Content { get; set; } = string.Empty;

[VectorStoreData]
public string Category { get; set; } = string.Empty;

[VectorStoreVector(1536)] // dimensions must match your embedding model
public ReadOnlyMemory<float> Embedding { get; set; }
}
  1. [VectorStoreKey] — the unique identifier for each document
  2. [VectorStoreData] — plain metadata stored alongside the vector
  3. [VectorStoreVector(1536)] — the embedding field; dimension must match the model (text-embedding-3-small produces 1536-dimensional vectors)

Step 2 — Set Up the Services

var openAiClient = new OpenAIClient(apiKey);

// Embedding generator — converts text to vectors
IEmbeddingGenerator<string, Embedding<float>> embeddingGenerator =
openAiClient.GetEmbeddingClient("text-embedding-3-small")
.AsIEmbeddingGenerator();

// Chat client — generates the final answer
IChatClient chatClient =
openAiClient.GetChatClient("gpt-4o-mini")
.AsIChatClient();

// In-memory vector store — index + search documents
var vectorStore = new InMemoryVectorStore();
var collection = vectorStore.GetCollection<string, FitnessFaq>("fitness-kb");
await collection.EnsureCollectionExistsAsync();

IEmbeddingGenerator and IChatClient are provider-neutral interfaces from Microsoft.Extensions.AI. Swapping from OpenAI to Azure OpenAI or a local Ollama model only requires changing the construction code — the rest of the RAG pipeline is identical.

Step 3 — Populate the Knowledge Base

foreach (var doc in KnowledgeBase)
{
var embedding = await embeddingGenerator.GenerateVectorAsync(doc.Content);
await collection.UpsertAsync(new FitnessFaq
{
DocId = doc.DocId,
Title = doc.Title,
Category = doc.Category,
Content = doc.Content,
Embedding = embedding
});
Console.WriteLine($"Indexed: [{doc.Category}] {doc.Title}");
}

Each document's content text is embedded and stored alongside its metadata. This index step happens once (at application start, or as a background job when documents are added or updated).

Step 4 — Retrieve Relevant Documents

var queryVector = await embeddingGenerator.GenerateVectorAsync(question);
var hits = collection.SearchAsync(queryVector, top: 2);

var docs = new List<FitnessFaq>();
await foreach (var hit in hits)
docs.Add(hit.Record);

SearchAsync returns the top documents whose embedding vectors are closest to the query vector (cosine distance by default).

Step 5 — Build the Augmented Prompt

var contextBlock = string.Join("\n\n", docs.Select(d =>
$"### {d.Title}\n{d.Content}"));

var prompt = $"""
You are a helpful fitness center staff assistant.
Answer the member's question using ONLY the information provided in the context below.
If the answer is not covered in the context, respond with:
"I don't have that information — please ask our front desk team."

## Context
{contextBlock}

## Member Question
{question}
""";

Four key constraints in the system prompt:

  1. Define the assistant's role clearly
  2. Restrict the model to only the provided context
  3. Tell it what to say when the answer is absent from the context
  4. Separate the context block from the user question visually

Step 6 — Generate the Answer

var response = await chatClient.GetResponseAsync(prompt);
return response.Text;

GetResponseAsync calls the chat model with the augmented prompt. The model reads the injected policy text and produces a grounded reply. It cannot hallucinate details that are not in the context when the prompt is engineered correctly.

Part 3: Chunking Strategy for Large Documents

Embedding models have an input-token limit (typically 8 192 tokens for text-embedding-3-small). Long documents must be split into overlapping chunks so that each chunk is small enough to embed and specific enough to be retrieved precisely.

IEnumerable<string> ChunkByWords(string text, int chunkSize, int overlap)
{
var words = text.Split(' ', StringSplitOptions.RemoveEmptyEntries);
var step = chunkSize - overlap;
for (int i = 0; i < words.Length; i += step)
{
var chunk = string.Join(" ", words.Skip(i).Take(chunkSize));
if (!string.IsNullOrWhiteSpace(chunk))
yield return chunk;
}
}

The overlap parameter ensures that a sentence split across two chunk boundaries still appears in at least one complete chunk, preventing retrieval gaps.

Chunking Trade-offs

FactorSmaller chunksLarger chunks
Retrieval precisionHigher — tightly focused matchesLower — noisy context
Context completenessLower — may miss surrounding contextHigher — more background included
Token cost per queryLowerHigher
Index sizeLarger (more chunks to embed)Smaller

Part 4: RAG Best Practices

Retrieval Quality Levers

TechniqueEffect
Increase top-KMore context provided; may add noise
Metadata filteringFilter by category, date, or author before vector search
RerankingScore retrieved docs by a cross-encoder for higher precision
Hybrid searchCombine vector search with keyword (BM25) search

Provider Flexibility

Because both IEmbeddingGenerator and IChatClient are abstractions, you can run the same RAG pipeline locally with Ollama:

// Swap in Ollama models — zero other changes required
IEmbeddingGenerator<string, Embedding<float>> embeddingGenerator =
ollamaClient.AsIEmbeddingGenerator("all-minilm");

IChatClient chatClient =
ollamaClient.AsIChatClient("phi4-mini");

Production Vector Stores

StorePackageUse Case
InMemoryMicrosoft.SemanticKernel.Connectors.InMemoryLocal dev and prototyping
Azure AI SearchMicrosoft.SemanticKernel.Connectors.AzureAISearchEnterprise, hybrid search
QdrantMicrosoft.SemanticKernel.Connectors.QdrantOpen-source, self-hosted
WeaviateMicrosoft.SemanticKernel.Connectors.WeaviateOpen-source, GraphQL API
Postgres (pgvector)Microsoft.SemanticKernel.Connectors.PostgresExisting Postgres databases

Let's Review: What You Learned

ConceptSummary
The Knowledge ProblemLLMs only know what they were trained on; RAG fills the gap at query time
Retrieve phaseEmbed the question, search the vector store, return top-K documents
Augment phaseInject retrieved documents into the system prompt as context
Generate phaseThe chat model answers using only the injected context
ChunkingSplit large docs into overlapping word-windows before embedding
Provider flexibilityIEmbeddingGenerator and IChatClient abstractions allow any backend

Quick Self-Check

  1. What are the three phases of a RAG pipeline?
  2. Why do we chunk large documents instead of embedding them whole?
  3. What should the model reply when the answer is not in the retrieved context?
  4. Which NuGet package provides InMemoryVectorStore?

Full Example

Complete, verbatim source of RagDemo.cs:

using Microsoft.Extensions.AI;
using Microsoft.Extensions.VectorData;
using Microsoft.SemanticKernel.Connectors.InMemory;
using OpenAI;

namespace MicrosoftAgentFrameworkLesson.ConsoleApp;

// ── Data model for the fitness center knowledge base ──────────────────────

public class FitnessFaq
{
[VectorStoreKey]
public string DocId { get; set; } = string.Empty;

[VectorStoreData]
public string Title { get; set; } = string.Empty;

[VectorStoreData]
public string Content { get; set; } = string.Empty;

[VectorStoreData]
public string Category { get; set; } = string.Empty;

[VectorStoreVector(1536)]
public ReadOnlyMemory<float> Embedding { get; set; }
}

// ── RAG demo class ────────────────────────────────────────────

public static class RagDemo
{
// Fitness center knowledge base — 8 policy and service documents
private static readonly (string DocId, string Title, string Category, string Content)[] KnowledgeBase =
[
("mbr-001", "Membership Tiers and Pricing", "Membership",
"We offer three membership tiers. Basic ($29/month) gives access to the gym floor and locker rooms. " +
"Premium ($49/month) adds unlimited group classes and sauna access. " +
"Elite ($79/month) includes two personal training sessions per month, monthly nutrition consultations, and priority booking for all facilities."),

("mbr-002", "Membership Cancellation Policy", "Membership",
"Members may cancel at any time by submitting a written request 30 days in advance. " +
"Monthly contracts require a 30-day notice period. Annual contracts carry a $75 early termination fee if cancelled before the 12-month term ends. " +
"Freeze options are available for up to 3 continuous months per year at no extra charge."),

("mbr-003", "Guest Pass Rules", "Membership",
"Premium and Elite members may bring one guest per visit, up to 3 times per month at no charge. " +
"Basic members may purchase day passes for guests at $15 each. " +
"All guests must sign a liability waiver at the front desk. Guests may not use the sauna, steam room, or reserve group class spots."),

("pt-001", "Personal Training Sessions", "Training",
"Personal training sessions are 60 minutes long and can be booked via the mobile app or at the front desk. " +
"Sessions must be cancelled at least 24 hours in advance to avoid being charged. " +
"Each trainer's profile lists their specializations: strength, weight loss, sports rehabilitation, and athletic performance."),

("pt-002", "Injury Prevention and Medical Clearance", "Training",
"All new members receive a complimentary 30-minute orientation with a certified trainer. " +
"A functional movement screen is recommended before beginning any resistance training program. " +
"Members recovering from surgery or injury must present a signed physician clearance form before using free weights or resistance machines."),

("cls-001", "Group Classes Schedule and Booking", "Classes",
"Group classes run Monday through Saturday. Available formats include HIIT, yoga, spin, Pilates, and kickboxing. " +
"Classes can be booked up to 7 days in advance through the member app. " +
"Waitlists open automatically when classes reach capacity, and spots are assigned as cancellations occur within 2 hours of class start."),

("nut-001", "Nutrition Coaching Services", "Nutrition",
"Elite members receive one 45-minute nutrition consultation per month included in their membership. " +
"Additional sessions cost $60 each and are available to all membership tiers. " +
"Our registered dietitians provide personalized meal plans, macro-tracking guidance, and evidence-based supplement recommendations aligned with each member's fitness goals."),

("eqp-001", "Equipment Usage and Safety Guidelines", "Facilities",
"All members must wipe down equipment before and after each use using the provided sanitizing wipes. " +
"Collars must be used on barbells at all times in the free weight area. Members under 16 must be accompanied by an adult guardian in the weight room. " +
"Report malfunctioning equipment to the front desk immediately — do not continue to use it. Chalk is permitted only in the designated lifting platform zone."),
];

public static async Task RunAsync()
{
var apiKey = Environment.GetEnvironmentVariable("OPEN_AI_KEY");
if (string.IsNullOrWhiteSpace(apiKey))
{
Console.WriteLine("Please set the OPEN_AI_KEY environment variable.");
return;
}

var openAiClient = new OpenAIClient(apiKey);

IEmbeddingGenerator<string, Embedding<float>> embeddingGenerator =
openAiClient.GetEmbeddingClient("text-embedding-3-small")
.AsIEmbeddingGenerator();

IChatClient chatClient =
openAiClient.GetChatClient("gpt-4o-mini")
.AsIChatClient();

// ── Part 1: Index the Knowledge Base ─────────────────────────────

Console.WriteLine("═══════════════════════════════════════════════════════");
Console.WriteLine(" PART 1: Indexing Fitness Center Knowledge Base");
Console.WriteLine("═══════════════════════════════════════════════════════\n");

var vectorStore = new InMemoryVectorStore();
var collection = vectorStore.GetCollection<string, FitnessFaq>("fitness-kb");
await collection.EnsureCollectionExistsAsync();

foreach (var doc in KnowledgeBase)
{
var embedding = await embeddingGenerator.GenerateVectorAsync(doc.Content);
await collection.UpsertAsync(new FitnessFaq
{
DocId = doc.DocId,
Title = doc.Title,
Category = doc.Category,
Content = doc.Content,
Embedding = embedding
});
Console.WriteLine($" Indexed: [{doc.Category,-12}] {doc.Title}");
}

Console.WriteLine($"\n Total documents indexed: {KnowledgeBase.Length}\n");

// ── Part 2: RAG Pipeline (Retrieve → Augment → Generate) ─────────────

Console.WriteLine("═══════════════════════════════════════════════════════");
Console.WriteLine(" PART 2: Retrieve → Augment → Generate");
Console.WriteLine("═══════════════════════════════════════════════════════\n");

string[] memberQuestions =
[
"I want to stop my gym membership — what do I need to do?",
"Can I bring a friend along when I work out?",
"What nutrition help is available and how much does it cost?",
];

// Local function captures collection, embeddingGenerator, and chatClient via closure
async Task<string> AskAsync(string question, int topK = 2)
{
// 1. RETRIEVE — embed the question and search the knowledge base
var queryVector = await embeddingGenerator.GenerateVectorAsync(question);
var hits = collection.SearchAsync(queryVector, top: topK);

var docs = new List<FitnessFaq>();
await foreach (var hit in hits)
docs.Add(hit.Record);

Console.Write($" Retrieved: ");
Console.WriteLine(string.Join(", ", docs.Select(d => d.Title)));

// 2. AUGMENT — inject the retrieved documents into the prompt
var contextBlock = string.Join("\n\n", docs.Select(d =>
$"### {d.Title}\n{d.Content}"));

var prompt = $"""
You are a helpful fitness center staff assistant.
Answer the member's question using ONLY the information provided in the context below.
If the answer is not covered in the context, respond with:
"I don't have that information — please ask our front desk team."

## Context
{contextBlock}

## Member Question
{question}
""";

// 3. GENERATE — call the chat model with the augmented prompt
var response = await chatClient.GetResponseAsync(prompt);
return response.Text;
}

foreach (var question in memberQuestions)
{
Console.WriteLine($"Question: {question}");
var answer = await AskAsync(question);
Console.WriteLine($"Answer: {answer}");
Console.WriteLine(new string('─', 55));
}

// ── Part 3: Text Chunking for Large Documents ─────────────────────

Console.WriteLine("\n═══════════════════════════════════════════════════════");
Console.WriteLine(" PART 3: Chunking Strategy for Large Documents");
Console.WriteLine("═══════════════════════════════════════════════════════\n");

const string longHandbookSection =
"The Fitness Center Employee Handbook covers operational procedures across six sections. " +
"Section 1 explains shift scheduling, payroll processing, and overtime approval. " +
"Section 2 outlines member interaction standards, de-escalation techniques, and complaint resolution. " +
"Section 3 describes equipment maintenance intervals, safety inspection checklists, and vendor contact procedures. " +
"Section 4 details emergency protocols for fire evacuation, medical incidents, and power outages. " +
"Section 5 covers hygiene and sanitation standards for every facility area. " +
"Section 6 defines disciplinary procedures and the performance review cycle for all staff grades.";

var chunks = ChunkByWords(longHandbookSection, chunkSize: 25, overlap: 5).ToList();

Console.WriteLine($" Document word count : {longHandbookSection.Split(' ').Length}");
Console.WriteLine($" Chunk size : 25 words | Overlap: 5 words");
Console.WriteLine($" Chunks produced : {chunks.Count}\n");

for (int i = 0; i < chunks.Count; i++)
Console.WriteLine($" Chunk {i + 1,2}: \"{chunks[i].Trim()}\"");

Console.WriteLine();
}

private static IEnumerable<string> ChunkByWords(string text, int chunkSize, int overlap)
{
var words = text.Split(' ', StringSplitOptions.RemoveEmptyEntries);
var step = chunkSize - overlap;
for (int i = 0; i < words.Length; i += step)
{
var chunk = string.Join(" ", words.Skip(i).Take(chunkSize));
if (!string.IsNullOrWhiteSpace(chunk))
yield return chunk;
}
}
}
Share this lesson: