Microsoft Agent Framework Agents Created: 16 Feb 2026 Updated: 18 Feb 2026

MultiModels with Agents

1. Introduction

The Microsoft Agent Framework supports multimodal input — you can send images alongside text to an agent, and the agent can analyze and respond to the image content. This opens up use cases like image description, visual comparison, document analysis, and more.

In this lesson, you will learn how to create a ChatMessage that includes both text and image content using TextContent and UriContent. The agent (backed by a vision-capable model like gpt-4o) can then analyze the image and respond accordingly.

2. Prerequisites

.NET 10 SDK installed
An OpenAI API key (set the OPEN_AI_KEY environment variable)
A vision-capable model (e.g., gpt-4o)
The following NuGet packages:
Microsoft.Agents.AI
Microsoft.Extensions.AI.OpenAI

3. Core Concepts

3.1. ChatMessage with Mixed Content

A ChatMessage can contain multiple content items. For multimodal input, you combine TextContent (your text prompt) with UriContent (an image URL) in a single message.

ChatMessage message = new(ChatRole.User, [

new TextContent("What do you see in this image?"),

new UriContent("https://example.com/image.jpg", "image/jpeg")

]);

3.2. Content Types

Type	Description	Use Case
`TextContent`	Plain text content in a message	Prompts, instructions, questions
`UriContent`	Content referenced by a URI (URL)	Images from the web, publicly accessible files
`DataContent`	Raw binary data (e.g., base64-encoded)	Local images, generated images

3.3. Creating a Vision Agent

Any agent backed by a vision-capable model can process images. You create one the same way as a text agent — no special configuration is required for image support:

var agent = chatClient.AsAIAgent(new ChatClientAgentOptions

{

Name = "VisionAgent",

Description = "You are a helpful agent that can analyze images."

});

4. Step-by-Step: Passing Images to an Agent

Step 1 — Create the Chat Client

Create a chat client using a vision-capable model like gpt-4o:

var chatClient = new OpenAIClient(apiKey)

.GetChatClient("gpt-4o")

.AsIChatClient();

Step 2 — Create the Agent

var agent = chatClient.AsAIAgent(new ChatClientAgentOptions

{

Name = "VisionAgent",

Description = "You are a helpful agent that can analyze images and provide detailed descriptions."

});

Step 3 — Build the Message with Image Content

Create a ChatMessage that contains both a text prompt and an image URL. Use TextContent for the text and UriContent for the image:

ChatMessage message = new(ChatRole.User, [

new TextContent("What do you see in this image?"),

new UriContent(

"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",

"image/jpeg")

]);

Note: The second parameter of UriContent is the MIME type of the image (e.g., "image/jpeg", "image/png").

Step 4 — Run the Agent

var response = await agent.RunAsync(message);

Console.WriteLine(response);

The agent will analyze the image and return a text description.

5. Demo 1 — Basic Image Analysis

This demo shows the simplest use case: sending an image URL to the agent and receiving a text description. The agent analyzes the visual content and describes what it sees.

What it demonstrates:

Creating a ChatMessage with TextContent + UriContent
Using agent.RunAsync(message) with image input
Receiving a text response describing the image

ChatMessage message = new(ChatRole.User, [

new TextContent("What do you see in this image? Describe it in detail."),

new UriContent(imageUrl, "image/jpeg")

]);

var response = await agent.RunAsync(message);

Console.WriteLine(response);

Use Cases: Image cataloging, accessibility descriptions, content moderation.

6. Demo 2 — Image Comparison

This demo shows how to send multiple images in a single message. The agent compares the images and identifies similarities and differences.

What it demonstrates:

Including multiple UriContent items in one ChatMessage
Asking the agent to compare visual content

ChatMessage message = new(ChatRole.User, [

new TextContent("Compare these two images. What are the similarities and differences?"),

new UriContent(imageUrl1, "image/jpeg"),

new UriContent(imageUrl2, "image/jpeg")

]);

Use Cases: Before/after comparison, quality control, visual diff analysis.

7. Demo 3 — Image + Structured Output

This demo combines multimodal input (from this lesson) with structured output (from Lesson 4). The agent analyzes an image and returns the results as a strongly-typed C# object instead of free-form text.

What it demonstrates:

Combining UriContent with ChatResponseFormat.ForJsonSchema()
Deserializing image analysis into a typed ImageAnalysisResult object
Programmatic access to extracted image data (subject, mood, colors, objects)

public class ImageAnalysisResult

{

[JsonPropertyName("subject")]

public string? Subject { get; set; }

[JsonPropertyName("setting")]

public string? Setting { get; set; }

[JsonPropertyName("colors")]

public List<string>? Colors { get; set; }

[JsonPropertyName("mood")]

public string? Mood { get; set; }

[JsonPropertyName("objects")]

public List<string>? Objects { get; set; }

[JsonPropertyName("description")]

public string? Description { get; set; }

}

// Configure structured output

JsonElement schema = AIJsonUtilities.CreateJsonSchema(typeof(ImageAnalysisResult));

var chatOptions = new ChatOptions

{

ResponseFormat = ChatResponseFormat.ForJsonSchema(

schema: schema,

schemaName: "ImageAnalysisResult",

schemaDescription: "Structured analysis of an image")

};

// Send image and get structured response

ChatMessage message = new(ChatRole.User, [

new TextContent("Analyze this image and extract structured information."),

new UriContent(imageUrl, "image/jpeg")

]);

var response = await agent.RunAsync(message);

var analysis = response.Deserialize<ImageAnalysisResult>(JsonSerializerOptions.Web);

Console.WriteLine($"Subject: {analysis.Subject}");

Console.WriteLine($"Mood: {analysis.Mood}");

Console.WriteLine($"Colors: {string.Join(", ", analysis.Colors ?? [])}");

Use Cases: Automated image tagging, visual search indexing, content management systems.

8. Demo 4 — Conversational Image Analysis

This demo shows a multi-turn conversation about an image. The agent receives an image in the first message, then answers follow-up questions about it in subsequent turns — without needing to re-send the image.

What it demonstrates:

Sending an image in the first message
Building a conversation history with List<ChatMessage>
Asking follow-up questions that reference the original image

// Turn 1: Send the image

ChatMessage imageMessage = new(ChatRole.User, [

new TextContent("What is in this image?"),

new UriContent(imageUrl, "image/jpeg")

]);

var response1 = await agent.RunAsync(imageMessage);

// Turn 2: Follow-up (text only — agent remembers the image)

var messages = new List<ChatMessage>

{

imageMessage,

new ChatMessage(ChatRole.Assistant, response1.ToString()),

new ChatMessage(ChatRole.User, "What architectural style is the building?")

};

var response2 = await agent.RunAsync(messages);

Use Cases: Interactive image exploration, educational tools, customer support with visual context.

9. Demo 5 — Streaming Image Analysis

This demo shows how to stream the agent's response while it analyzes an image. Streaming is useful for long analyses, as the user sees progressive results instead of waiting for the complete response.

What it demonstrates:

Using agent.RunStreamingAsync(message) with image input
Processing streaming updates with await foreach
Displaying results progressively in the console

ChatMessage message = new(ChatRole.User, [

new TextContent("Provide a very detailed analysis of this image."),

new UriContent(imageUrl, "image/jpeg")

]);

await foreach (var update in agent.RunStreamingAsync(message))

{

Console.Write(update.Text);

}

Use Cases: Real-time analysis dashboards, chat UIs, long-running visual inspections.

10. Best Practices

Do's

Use a vision-capable model (e.g., gpt-4o)
Always specify the correct MIME type in UriContent
Write clear, specific text prompts alongside images
Use publicly accessible image URLs
Combine with structured output for machine-readable results
Use streaming for detailed image analyses

Don'ts

Don't send extremely large images (resize or compress first)
Don't use models that don't support vision (e.g., text-only models)
Don't send more than 5-10 images in a single message (performance)
Don't expect pixel-perfect accuracy for text extraction from images

11. Troubleshooting

Problem: Agent cannot analyze the image.

Solution: Ensure you are using a vision-capable model like gpt-4o. Text-only models cannot process images.

Problem: Image URL returns an error.

Solution: Verify the URL is publicly accessible. Private or authenticated URLs will fail. Check the MIME type matches the actual image format.

Problem: Response is generic or inaccurate.

Solution: Write more specific prompts. Instead of "describe this image", try "list all objects visible in this image and estimate their distance from the camera".

Problem: Multi-turn conversation loses image context.

Solution: Include the full conversation history (including the original image message) in each subsequent call.

12. Summary

In this lesson, we learned how to use images with agents in the Microsoft Agent Framework:

Creating ChatMessage with TextContent + UriContent
Analyzing single images and multiple images
Combining multimodal input with structured output (JSON schema)
Building multi-turn conversations with image context
Streaming image analysis responses

Useful Resources

Running the Application

# Set the OPEN_AI_KEY environment variable

$env:OPEN_AI_KEY = "your-api-key-here"

# Run the project

dotnet run

# Select Lesson 5 from the main menu, then pick a demo

Share this lesson:

Navigation

Progress 6 / 64

Start 9% Complete

Producing Structured Output with Agents Background Responses in the Microsoft Agent Framework

Statistics

12 Lessons in Agents

14 SubCategories in Microsoft Agent Framework

64 Total Lessons in Microsoft Agent Framework

dotnetacademy

MultiModels with Agents

1. Introduction

2. Prerequisites

3. Core Concepts

3.1. ChatMessage with Mixed Content

3.2. Content Types

3.3. Creating a Vision Agent

4. Step-by-Step: Passing Images to an Agent

Step 1 — Create the Chat Client

Step 2 — Create the Agent

Step 3 — Build the Message with Image Content

Step 4 — Run the Agent

5. Demo 1 — Basic Image Analysis

6. Demo 2 — Image Comparison

7. Demo 3 — Image + Structured Output

8. Demo 4 — Conversational Image Analysis

9. Demo 5 — Streaming Image Analysis

10. Best Practices

Do's

Don'ts

11. Troubleshooting

12. Summary

Useful Resources

Running the Application

Details

Category

Navigation

Statistics