Semantic Kernel Prompt Created: 18 Jan 2026 Updated: 18 Jan 2026

Implementing Response Streaming in Semantic Kernel

In the world of Generative AI, latency is a significant challenge. Large Language Models (LLMs) can take several seconds to generate a complete response. For end-users, waiting for a "block" of text to appear all at once can feel sluggish.

Streaming solves this by delivering pieces of the message (tokens) as they are generated, creating a dynamic, "typing" effect that makes the application feel significantly more responsive.

1. What is Streaming in LLMs?

Traditionally, an application sends a request and waits for the entire response to be finalized before displaying it. Streaming utilizes a server-sent events approach where the model pushes fragments of the content as soon as they are ready.

In Microsoft Semantic Kernel, this is handled via the GetStreamingChatMessageContentsAsync method, which returns an IAsyncEnumerable.

2. Implementation: Real-Time Token Delivery

The following example demonstrates a .NET console application that processes user input and streams the AI's response to the console character by character.

The Code

using Microsoft.SemanticKernel;

using Microsoft.SemanticKernel.ChatCompletion;

using Microsoft.SemanticKernel.Connectors.OpenAI;

// 1. Setup Kernel and Services

var apiKey = Environment.GetEnvironmentVariable("OPEN_AI_KEY");

var kernel = Kernel.CreateBuilder()

.AddOpenAIChatCompletion(modelId: "gpt-4o", apiKey: apiKey)

.Build();

var chat = kernel.GetRequiredService<IChatCompletionService>();

var history = new ChatHistory();

// 2. Define Persona

history.AddSystemMessage("You are an AI assistant helping to organize daily tasks.");

while (true)

{

Console.Write(" User >>> ");

var prompt = Console.ReadLine();

if (string.IsNullOrEmpty(prompt)) break;

history.AddUserMessage(prompt);

Console.Write(" Bot >>> ");

// 3. Streaming Execution

string fullMessage = string.Empty;

// Using IAsyncEnumerable to iterate over tokens as they arrive

await foreach (var token in chat.GetStreamingChatMessageContentsAsync(history))

{

Console.Write(token.Content);

fullMessage += token.Content;

}

Console.WriteLine();

// 4. Update History with the concatenated result

history.AddAssistantMessage(fullMessage);

}

3. Key Technical Considerations

Concatenating the Response

When streaming, the history object cannot automatically know what the full message was because it only sees individual fragments. You must manually concatenate the token.Content into a string (e.g., fullMessage) and add it to the ChatHistory after the stream completes to maintain conversation context.

IAsyncEnumerable Pattern

The use of await foreach is the standard .NET pattern for handling asynchronous streams. This allows the thread to remain unblocked while waiting for the next token from the OpenAI servers.

Performance vs. Perceived Performance

While streaming doesn't necessarily make the model generate tokens faster, it drastically reduces the Time to First Token (TTFT). Users perceive the application as being faster because they can start reading the beginning of the answer while the end is still being computed.

4. Best Practices for Streaming

UI Feedback: In web or desktop applications, use streaming to update the UI progressively. This prevents "frozen" loading screens.
Error Handling: Remember that a stream can break midway due to network issues. Ensure your fullMessage logic can handle partial data.
Tool Calling: When using Plugins, be aware that the model might stream "thought" tokens before deciding to call a function. Semantic Kernel's automatic function calling handles much of this complexity, but it’s a factor to watch in custom implementations.

Share this lesson:

Navigation

Progress 8 / 28

Start 29% Complete

Implementing Persistent Context with Chat History in .NET Prompt Execution Settings in Semantic Kernel

Statistics

10 Lessons in Prompt

4 SubCategories in Semantic Kernel

28 Total Lessons in Semantic Kernel

dotnetacademy