Semantic Kernel Prompt Created: 18 Jan 2026 Updated: 18 Jan 2026

Implementing Persistent Context with Chat History in .NET

In a standard API call, Large Language Models (LLMs) are "stateless"—they do not remember previous interactions. To create a natural, conversational experience where the AI understands follow-up questions (e.g., "Tell me more about the first step"), developers must manually manage the conversation's state.

In Microsoft Semantic Kernel, this is achieved through the ChatHistory object.

1. The Anatomy of Chat History

Chat History is essentially a collection of messages categorized by Roles. To the LLM, the entire history is sent back with every new prompt, providing the "memory" needed to understand context.

  1. System Message: Sets the "rules of the road" and the persona (e.g., "You are a task organizer"). It usually stays at the top of the history.
  2. User Message: The input provided by the human.
  3. Assistant Message: The response generated by the AI. Saving this is crucial for the AI to remember what it previously said.

2. Implementing a Stateful Loop

The following implementation demonstrates a continuous loop where the user can chat with the assistant, and the assistant maintains context using the ChatHistory class.

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Connectors.OpenAI;

// 1. Setup Kernel and Services
var apiKey = Environment.GetEnvironmentVariable("OPEN_AI_KEY");
var kernel = Kernel.CreateBuilder()
.AddOpenAIChatCompletion(modelId: "gpt-4o", apiKey: apiKey)
.Build();

var chat = kernel.GetRequiredService<IChatCompletionService>();

// 2. Initialize the History with a System Role
var history = new ChatHistory();
history.AddSystemMessage("You are an AI assistant that breaks down complex tasks into steps.");

var executionSettings = new OpenAIPromptExecutionSettings { Temperature = 0.1 };

// 3. The Conversation Loop
while (true)
{
Console.Write(" User >>> ");
var prompt = Console.ReadLine();
if (string.IsNullOrEmpty(prompt)) break;
// Add user input to the history
history.AddUserMessage(prompt);

// Send the entire history to the model
var response = await chat.GetChatMessageContentAsync(history, executionSettings);
Console.WriteLine($" Bot >>> {response.Content}");

// Add the AI's response to history to maintain context for the next turn
history.Add(response);
}

3. The Context Window Challenge

While adding every message to ChatHistory creates a great user experience, it introduces two technical constraints:

  1. Token Limits: Every LLM has a "Context Window" (e.g., 128k tokens for GPT-4o). If your history becomes too long, the model will eventually hit its limit and fail or "forget" the earliest messages.
  2. Increased Latency & Cost: Since you send the entire history with every new message, the number of input tokens grows exponentially, increasing both the cost and the time it takes for the model to process the request.

4. Best Practices for History Management

  1. Summarization: When history gets too long, ask the LLM to summarize the previous 10 messages and replace them with a single "Summary Message" to save tokens.
  2. Sliding Window: Only keep the last N messages in the ChatHistory object to stay within a safe token budget.
  3. Persistent Storage: For production apps, you should save the ChatHistory to a database (like SQL Server or CosmosDB) so users can resume their conversation across different sessions.
Share this lesson: