Implementing Persistent Context with Chat History in .NET
In a standard API call, Large Language Models (LLMs) are "stateless"—they do not remember previous interactions. To create a natural, conversational experience where the AI understands follow-up questions (e.g., "Tell me more about the first step"), developers must manually manage the conversation's state.
In Microsoft Semantic Kernel, this is achieved through the ChatHistory object.
1. The Anatomy of Chat History
Chat History is essentially a collection of messages categorized by Roles. To the LLM, the entire history is sent back with every new prompt, providing the "memory" needed to understand context.
- System Message: Sets the "rules of the road" and the persona (e.g., "You are a task organizer"). It usually stays at the top of the history.
- User Message: The input provided by the human.
- Assistant Message: The response generated by the AI. Saving this is crucial for the AI to remember what it previously said.
2. Implementing a Stateful Loop
The following implementation demonstrates a continuous loop where the user can chat with the assistant, and the assistant maintains context using the ChatHistory class.
3. The Context Window Challenge
While adding every message to ChatHistory creates a great user experience, it introduces two technical constraints:
- Token Limits: Every LLM has a "Context Window" (e.g., 128k tokens for GPT-4o). If your history becomes too long, the model will eventually hit its limit and fail or "forget" the earliest messages.
- Increased Latency & Cost: Since you send the entire history with every new message, the number of input tokens grows exponentially, increasing both the cost and the time it takes for the model to process the request.
4. Best Practices for History Management
- Summarization: When history gets too long, ask the LLM to summarize the previous 10 messages and replace them with a single "Summary Message" to save tokens.
- Sliding Window: Only keep the last N messages in the
ChatHistoryobject to stay within a safe token budget. - Persistent Storage: For production apps, you should save the
ChatHistoryto a database (like SQL Server or CosmosDB) so users can resume their conversation across different sessions.