Microsoft Agent Framework Generative AI ( Tutorial 1 ) Created: 14 Apr 2026 Updated: 14 Apr 2026

Understanding Tokens in LLMs

Introduction

When you work with a large language model (LLM), the text you send is not processed character by character or word by word. Instead, the model first breaks text into small pieces called tokens. A token might be a word, part of a word, or even punctuation. When you see "token limits," this refers to how much text the model can process at once.

Understanding tokens is essential for developers working with generative AI because tokens directly affect billing, context window limits, and application design.

What Is a Token?

A token is the smallest unit of text that an LLM can process. During training, text is first broken into tokens by a tokenizer. The LLM then analyzes the semantic relationships between tokens — how commonly they are used together or in similar contexts. After training, the LLM uses those learned patterns to generate output tokens based on the input sequence.

For example, the sentence:

I heard a dog bark loudly at a cat

Could be tokenized into 9 tokens:

  1. I
  2. heard
  3. a
  4. dog
  5. bark
  6. loudly
  7. at
  8. a
  9. cat

The set of all unique tokens the model knows is called its vocabulary. Each token gets a unique ID. The sentence above would be represented as a sequence of token IDs: [1, 2, 3, 4, 5, 6, 7, 3, 8]. Notice that the word "a" appears twice but uses the same ID (3).

Common Tokenization Methods

MethodDescriptionExample
Word TokenizationText is split into individual words based on a delimiter (e.g. spaces)"Hello world"["Hello", "world"]
Character TokenizationText is split into individual characters"Hello"["H", "e", "l", "l", "o"]
Subword TokenizationText is split into partial words or character sets (e.g. BPE)"unbelievable"["un", "believ", "able"]

GPT models use Byte-Pair Encoding (BPE), a type of subword tokenization. In typical English text, one token is approximately four characters, and 100 tokens are roughly 75 words.

Trade-offs Between Token Sizes

Token SizeProsCons
Smaller tokens (character/subword)Handles unknown words, typos, and complex syntax well; may reduce vocabulary sizeMore tokens per text — more computation needed; smaller effective input/output
Larger tokens (word)Fewer tokens per text — less computation; larger effective input/outputLarger vocabulary needed; struggles with unknown words and typos

How LLMs Use Tokens

After tokenization, each token is assigned a numeric ID. Text becomes a sequence of numbers. These token ID sequences are then mapped to embeddings — multi-valued numeric vectors that represent semantic meaning. Tokens used in similar contexts end up with similar embedding values.

During output generation, the model predicts the next token in the sequence:

  1. The model assigns weights to each preceding token, representing its influence on what comes next.
  2. It uses the weights and embeddings to calculate the predicted next vector value.
  3. The model selects the most probable token from its vocabulary.
  4. This process repeats iteratively — each output token feeds back as input for the next prediction.

Token Limits (Context Window)

LLMs have a maximum number of tokens for input and output combined, called the context window. The model's token limit and tokenization method together determine the maximum length of text that can be provided as input or generated as output.

For example, if a model has a context window of 100 tokens and your input consumes 9 word tokens, you have 91 tokens left for the output. With character tokenization, the same input might consume 34 tokens, leaving only 66 for the output.

Common context window sizes:

ModelContext Window
GPT-3.5 Turbo16,385 tokens
GPT-4128,000 tokens
GPT-4o128,000 tokens

Token-Based Pricing

Generative AI services use token-based pricing. The cost of each request depends on the number of input tokens (your prompt) and output tokens (the model's response). Pricing is typically expressed as "price per 1 million tokens" and may differ between input and output.

This pricing model has a significant effect on how you design user interactions and how much preprocessing and post-processing you add. Counting tokens before sending requests helps you estimate costs and stay within budget.

Tokenization in .NET with Microsoft.ML.Tokenizers

The Microsoft.ML.Tokenizers NuGet library provides tools for tokenizing text in .NET applications. It supports multiple tokenization algorithms including BPE, Tiktoken (used by GPT models), Llama, and CodeGen.

Install the packages:

dotnet add package Microsoft.ML.Tokenizers
dotnet add package Microsoft.ML.Tokenizers.Data.O200kBase

Key Operations

MethodDescription
CountTokens(text)Returns the number of tokens in a text string
EncodeToIds(text)Converts text to a list of token IDs
EncodeToTokens(text)Returns detailed token information including values and IDs
Decode(ids)Converts token IDs back to text
GetIndexByTokenCount(text, count)Finds the character index for a specific token count from the start

Full Example

using Microsoft.ML.Tokenizers;

namespace MicrosoftAgentFrameworkLesson.ConsoleApp;

public static class TokenizationDemo
{
// A small recipe catalog — completely unrelated to the official docs examples
private static readonly string[] RecipeCatalog =
[
"Creamy mushroom risotto with parmesan cheese and fresh thyme, slow-cooked for 25 minutes until perfectly al dente.",
"Spicy Thai basil chicken stir-fry served over jasmine rice with crispy shallots and a squeeze of lime.",
"Classic New York cheesecake with a graham cracker crust, topped with fresh strawberry compote.",
"Grilled salmon fillet glazed with honey-soy marinade, accompanied by roasted asparagus and quinoa.",
"Authentic Neapolitan margherita pizza with San Marzano tomatoes, fresh mozzarella, and basil leaves."
];

public static Task RunAsync()
{
// Initialize the Tiktoken tokenizer for the gpt-4o model
Tokenizer tokenizer = TiktokenTokenizer.CreateForModel("gpt-4o");

// ══════════════════════════════════════════════
// PART 1 — Tokenize a Single Recipe
// ══════════════════════════════════════════════
Console.WriteLine("═══ Part 1: Tokenize a Single Recipe ═══\n");

string sampleRecipe = RecipeCatalog[0];
Console.WriteLine($"Original text:\n \"{sampleRecipe}\"\n");

// Count tokens
int tokenCount = tokenizer.CountTokens(sampleRecipe);
Console.WriteLine($"Token count: {tokenCount}");

// Encode to token IDs
IReadOnlyList<int> tokenIds = tokenizer.EncodeToIds(sampleRecipe);
Console.WriteLine($"Token IDs: [{string.Join(", ", tokenIds)}]");

// Encode to detailed tokens (value + ID pairs)
IReadOnlyList<EncodedToken> detailedTokens = tokenizer.EncodeToTokens(sampleRecipe, out _);
Console.WriteLine("\nDetailed tokens:");
foreach (EncodedToken token in detailedTokens)
{
Console.WriteLine($" ID: {token.Id,6} → \"{token.Value}\"");
}

// Decode back to text
string? decoded = tokenizer.Decode(tokenIds);
Console.WriteLine($"\nDecoded text:\n \"{decoded}\"");
Console.WriteLine($"Round-trip match: {sampleRecipe == decoded}\n");

// ══════════════════════════════════════════════
// PART 2 — Compare Token Counts Across Recipes
// ══════════════════════════════════════════════
Console.WriteLine("═══ Part 2: Token Counts Across Recipes ═══\n");

Console.WriteLine($"{"Recipe",-60} {"Chars",6} {"Tokens",7}");
Console.WriteLine(new string('─', 75));

foreach (string recipe in RecipeCatalog)
{
int count = tokenizer.CountTokens(recipe);
string preview = recipe.Length > 57 ? recipe[..57] + "..." : recipe;
Console.WriteLine($"{preview,-60} {recipe.Length,6} {count,7}");
}

// ══════════════════════════════════════════════
// PART 3 — Context Window Management
// ══════════════════════════════════════════════
Console.WriteLine("\n═══ Part 3: Context Window Management ═══\n");

// Simulate a prompt that combines all recipes
string fullPrompt = "You are a chef assistant. Summarize these recipes:\n\n"
+ string.Join("\n\n", RecipeCatalog.Select((r, i) => $"{i + 1}. {r}"));

int promptTokens = tokenizer.CountTokens(fullPrompt);
const int modelContextWindow = 128_000; // gpt-4o context window
int remainingTokens = modelContextWindow - promptTokens;

Console.WriteLine($"Full prompt token count : {promptTokens}");
Console.WriteLine($"Model context window : {modelContextWindow:N0}");
Console.WriteLine($"Remaining for output : {remainingTokens:N0}");

// Trim to first N tokens
const int maxInputTokens = 50;
int trimIndex = tokenizer.GetIndexByTokenCount(fullPrompt, maxInputTokens, out string? processed, out _);
processed ??= fullPrompt;
string trimmed = processed[..trimIndex];

Console.WriteLine($"\nTrimmed to first {maxInputTokens} tokens:");
Console.WriteLine($" \"{trimmed}...\"");
Console.WriteLine($" Actual token count: {tokenizer.CountTokens(trimmed)}");

// ══════════════════════════════════════════════
// PART 4 — Token-Based Cost Estimation
// ══════════════════════════════════════════════
Console.WriteLine("\n═══ Part 4: Token-Based Cost Estimation ═══\n");

// Hypothetical pricing: $2.50 per 1M input tokens, $10.00 per 1M output tokens (gpt-4o pricing)
const decimal inputPricePerMillionTokens = 2.50m;
const decimal outputPricePerMillionTokens = 10.00m;
const int estimatedOutputTokens = 200;

decimal inputCost = promptTokens / 1_000_000m * inputPricePerMillionTokens;
decimal outputCost = estimatedOutputTokens / 1_000_000m * outputPricePerMillionTokens;
decimal totalCost = inputCost + outputCost;

Console.WriteLine($"Input tokens : {promptTokens,8} → ${inputCost:F6}");
Console.WriteLine($"Output tokens : {estimatedOutputTokens,8} → ${outputCost:F6}");
Console.WriteLine($"Estimated total : → ${totalCost:F6}");

return Task.CompletedTask;
}
}

Reference

Understand tokens - Microsoft Learn


Share this lesson: