Microsoft Agent Framework Concept ( Chapter-2) Created: 20 Apr 2026 Updated: 20 Apr 2026

How LLMs Work: From Raw Text to Predictive Generation & Training

Large language models (LLMs) look like magic from the outside — you type text, and the model writes a thoughtful answer. Under the hood, the process is a four-stage pipeline: raw text is broken into tokens, each token becomes a vector, a transformer predicts the next token, and during training a loss is used to adjust the model’s weights. The same loop that trained the model is the loop that generates every output.

This article walks through the four stages end-to-end and shows a working .NET example using the Microsoft.ML.Tokenizers library that implements the first stage and simulates the rest for teaching purposes.

Stage 1 — Tokenization & ID Assignment

The first thing a model does with text is chop it into tokens. A token can be a whole word, part of a word, a single character, or punctuation. The tokenizer is the component that performs this split, and it also assigns every unique token a numeric ID. The full set of IDs the tokenizer knows is called the vocabulary.

Example — the sentence "The quick brown fox." might tokenize to:

[The] [ quick] [ brown] [ fox] [.]
IDs: 464 2017 3652 18814 13

From this point on, the model never sees text. It only sees numbers. Every downstream stage operates on these IDs.

Stage 2 — Contextual Embedding (Semantic Representation)

A raw integer ID says nothing about meaning: ID 464 isn’t “close to” ID 465 in any useful way. To capture meaning, the model passes each ID through an embedding layer — essentially a big lookup table that maps every token ID to a high-dimensional vector of floating-point numbers.

These vectors are learned during training so that tokens that appear in similar contexts end up with similar vectors. The word quick sits near fast and speedy; fox sits near dog and animal. The model does not know what a fox is, but it knows which other tokens behave like it.

Embeddings are what let the model generalize: a sentence it has never seen can still be represented in roughly the same region of vector space as sentences it was trained on.

Stage 3 — Next Token Prediction (Iterative Inference)

With the input represented as a sequence of embedding vectors, the transformer block comes next. It applies attention, which computes how much influence each prior token should have on what comes next, and then a feed-forward network mixes that information to produce a predicted next vector.

The predicted vector is compared to every token in the vocabulary, producing a probability for each. The candidate with the highest score (or one sampled from the distribution) becomes the next token. That token is appended to the context and the whole process repeats. This is the autoregressive loop:

  1. Take all tokens so far.
  2. Run them through the transformer.
  3. Pick the next token.
  4. Append it and go back to step 1.

One token at a time is why LLMs “stream” their answers word by word.

Stage 4 — Training Loop (Learning from Loss)

Stages 1–3 describe how a trained model generates text. Stage 4 is how the model became trained in the first place.

During training, the model is given a real piece of text and asked to predict the next token at every position. Its prediction (a probability distribution over the vocabulary) is compared against the actual next token from the training data using a loss function — usually cross-entropy:

loss = -log(probability assigned to the correct next token)

If the model gave the correct token a high probability, the loss is small. If it gave it a low probability, the loss is large. Backpropagation then computes how each weight in the network contributed to that loss and nudges every weight in the direction that would have reduced it. Repeat this across billions of training tokens and the model gradually becomes good at predicting what comes next.

Why This Matters for Developers

  1. Billing is tied to Stage 1 — you pay per token, not per character.
  2. Semantic search, retrieval, and RAG use Stage 2 — embeddings let you compare meaning.
  3. Streaming responses and latency come from Stage 3 — one token per forward pass.
  4. Fine-tuning is Stage 4 applied to your own data, usually with a much smaller loss-and-update loop.

A .NET Demo of the Pipeline

The example below uses Microsoft.ML.Tokenizers for Stage 1 with the real GPT-4o tokenizer, and it simulates the other three stages with clearly illustrative (not real) math so you can trace a single sentence through every stage. The scenario is deliberately simple: short descriptions of constellations.

Install the packages:

dotnet add package Microsoft.ML.Tokenizers
dotnet add package Microsoft.ML.Tokenizers.Data.O200kBase

Full Example

using Microsoft.ML.Tokenizers;

namespace MicrosoftAgentFrameworkLesson.ConsoleApp;

public static class HowLlmsWorkDemo
{
// Original domain: short astronomy / constellation descriptions.
private static readonly string[] ConstellationTexts =
[
"Orion the hunter shines in the winter sky.",
"Cassiopeia forms a distinct W shape above.",
"Ursa Major hosts the famous Big Dipper stars."
];

public static Task RunAsync()
{
Console.WriteLine("╔══════════════════════════════════════════════════════╗");
Console.WriteLine("║ How LLMs Work — Four-Stage Pipeline (Simulation) ║");
Console.WriteLine("╚══════════════════════════════════════════════════════╝\n");

Tokenizer tokenizer = TiktokenTokenizer.CreateForModel("gpt-4o");
string sample = ConstellationTexts[0];

// ══════════════════════════════════════════════
// STAGE 1 — Tokenization & ID Assignment
// ══════════════════════════════════════════════
Console.WriteLine("═══ Stage 1: Tokenization & ID Assignment ═══");
Console.WriteLine($"Input : \"{sample}\"\n");

IReadOnlyList<EncodedToken> tokens = tokenizer.EncodeToTokens(sample, out _);
Console.WriteLine($"{"Token",-15} {"ID",10}");
Console.WriteLine(new string('─', 27));
foreach (EncodedToken t in tokens)
{
Console.WriteLine($"{"\"" + t.Value + "\"",-15} {t.Id,10}");
}

IReadOnlyList<int> ids = tokenizer.EncodeToIds(sample);
Console.WriteLine($"\nID sequence: [{string.Join(", ", ids)}]\n");

// ══════════════════════════════════════════════
// STAGE 2 — Contextual Embedding (Simulated)
// ══════════════════════════════════════════════
Console.WriteLine("═══ Stage 2: Contextual Embedding (Simulated) ═══");
Console.WriteLine("Each token ID is mapped to a dense vector that captures its meaning.\n");

foreach (EncodedToken t in tokens.Take(5))
{
double[] emb = SimulateEmbedding(t.Id, dim: 4);
string vec = "[" + string.Join(", ", emb.Select(x => x.ToString("F3"))) + "]";
Console.WriteLine($" \"{t.Value}\" → {vec}");
}
Console.WriteLine();

// ══════════════════════════════════════════════
// STAGE 3 — Next Token Prediction (Autoregressive Loop)
// ══════════════════════════════════════════════
Console.WriteLine("═══ Stage 3: Next Token Prediction (Autoregressive) ═══");
Console.WriteLine("The model uses prior tokens to predict the most probable next token.\n");

List<int> context = [];
for (int i = 0; i < Math.Min(6, ids.Count); i++)
{
int nextId = ids[i];
string soFar = context.Count == 0 ? "(empty)" : tokenizer.Decode(context) ?? "";
Console.WriteLine($" Step {i + 1}");
Console.WriteLine($" Context so far : \"{soFar}\"");
Console.WriteLine($" Predicted next : ID {nextId} (\"{tokens[i].Value}\")\n");
context.Add(nextId);
}

// ══════════════════════════════════════════════
// STAGE 4 — Training Loop (Loss + Weight Adjustment)
// ══════════════════════════════════════════════
Console.WriteLine("═══ Stage 4: Training Loop (Simulated Loss) ═══");
Console.WriteLine("Predicted probabilities are compared to the ground-truth token using cross-entropy.\n");

Dictionary<string, double> candidateProbs = new()
{
{ "shines", 0.62 },
{ "glows", 0.21 },
{ "burns", 0.10 },
{ "orbits", 0.05 },
{ "spins", 0.02 }
};
const string truth = "shines";
double pCorrect = candidateProbs[truth];
double crossEntropyLoss = -Math.Log(pCorrect);

Console.WriteLine($"{"Candidate",-12} {"Probability",12}");
Console.WriteLine(new string('─', 25));
foreach ((string word, double p) in candidateProbs)
{
string marker = word == truth ? " ← truth" : "";
Console.WriteLine($"{word,-12} {p,12:F3}{marker}");
}

Console.WriteLine();
Console.WriteLine($"Cross-entropy loss = -log({pCorrect:F3}) = {crossEntropyLoss:F4}");
Console.WriteLine("Backpropagation then adjusts the weights so P(truth) grows next time.\n");

return Task.CompletedTask;
}

private static double[] SimulateEmbedding(int tokenId, int dim)
{
// Deterministic pseudo-random unit vector derived from the token ID.
// NOT a real embedding — illustrative only.
Random rng = new(tokenId);
double[] v = new double[dim];
double norm = 0;
for (int i = 0; i < dim; i++)
{
v[i] = rng.NextDouble() * 2 - 1;
norm += v[i] * v[i];
}
norm = Math.Sqrt(norm);
for (int i = 0; i < dim; i++) v[i] /= norm;
return v;
}
}

Reference

How generative AI and LLMs work - Microsoft Learn


Share this lesson: