Temperature – What Is It?
Introduction
When you send a prompt to a large language model (LLM), the model does not simply look up a fixed answer. It calculates a probability distribution over its vocabulary and samples the next token from that distribution. Temperature is the parameter that controls how that sampling is performed — and therefore how predictable or creative the model's output will be.
Understanding temperature is essential for anyone building AI-powered applications because choosing the wrong value can make an agent unreliable (too high) or dull and repetitive (too low).
What Is Temperature?
Temperature is a floating-point number — typically between 0.0 and 2.0 depending on the provider — that scales the logits (raw prediction scores) before they are converted to probabilities via the softmax function.
- Lower temperature (0.0 – 0.3) — The probability mass is concentrated on the top tokens. The model almost always picks the highest-probability token, producing deterministic, focused output.
- Medium temperature (0.4 – 0.6) — A balanced middle ground that introduces some variety without sacrificing too much coherence.
- Higher temperature (0.7 – 1.0) — The probability distribution flattens out, giving lower-ranked tokens a better chance of being selected. The result is more creative, varied output.
- Very high temperature (1.0 – 2.0) — The distribution becomes nearly uniform. Output may become incoherent or nonsensical because extremely unlikely tokens are now chosen frequently.
How Temperature Affects the Probability Distribution
Imagine the model has narrowed down the next token to three candidates with the following raw logits:
| Token | Logit | Probability at T=0.2 | Probability at T=1.0 |
|---|---|---|---|
| "sun" | 5.0 | ~99.7% | ~72.1% |
| "moon" | 2.0 | ~0.3% | ~17.9% |
| "star" | 1.0 | ~0.0% | ~9.9% |
At low temperature the winner ("sun") takes almost all the probability. At higher temperature the distribution flattens and "moon" or "star" have a reasonable chance of being picked.
Temperature Value Guide
| Temperature | Behavior | Best For |
|---|---|---|
| 0.0 | Near-deterministic; almost always picks the top token | Data extraction, classification, code generation |
| 0.1 – 0.3 | Focused with slight variation | Summaries, Q&A, agent tool-calling |
| 0.4 – 0.6 | Moderate variety | Drafting emails, general chat |
| 0.7 – 1.0 | Creative and diverse | Creative writing, brainstorming, storytelling |
| >1.0 | Highly random; may lose coherence | Experimental / artistic generation |
Related Parameters
Temperature does not work in isolation. Two sibling parameters often interact with it:
- Top P (nucleus sampling) — Instead of scaling logits, Top P truncates the candidate list to the smallest set whose cumulative probability is at least P. A value of 0.9 means "consider only tokens that together account for 90% of the probability." Microsoft recommends changing either Temperature or Top P, but not both at once.
- Frequency / Presence Penalty — These reduce the chance of tokens that have already appeared in the output, discouraging repetition independently of temperature.
Temperature in Agent Applications
For AI agents that call tools, make decisions, and interact with external systems, low temperatures (0.0 – 0.3) are strongly recommended. Agent frameworks rely on structured, repeatable responses — a creative answer that randomly reformats JSON or invents a non-existent tool call can break the entire workflow. Even at temperature 0, LLM outputs are not fully deterministic due to floating-point arithmetic and batching differences, but they are consistent enough for reliable agent behavior.
Setting Temperature in .NET with Microsoft.Extensions.AI
The Microsoft.Extensions.AI library provides the ChatOptions class with a Temperature property. You can set it per-call so a single application can use different temperatures for different tasks — low for data extraction, high for creative content.
Full Example
Reference
Microsoft Learn – LLM Fundamentals (Temperature and Determinism)