Prompt Execution Settings in Semantic Kernel
What Are Prompt Execution Settings?
Prompt Execution Settings are parameters passed to the LLM alongside your prompt. They don't change what you are asking the AI, but rather how the AI should go about generating the answer.
In Semantic Kernel architecture, Execution Settings are a core component. While there is a base PromptExecutionSettings class, developers working with OpenAI or Azure OpenAI models will primarily use their specific derivatives:
OpenAIPromptExecutionSettingsAzureOpenAIPromptExecutionSettings
These classes expose properties that directly map to the API parameters of the underlying models (like GPT-4 or GPT-3.5 Turbo).
Key Execution Setting Properties
Let's examine the most popular properties used to configure AI behavior, understanding their accepted values and impact.
1. Temperature: Controlling Randomness
- Type:
double - Default Value: Usually
1.0(depending on the specific model) - Range: typically
0.0to2.0
Temperature is perhaps the most frequently tweaked setting. It controls the randomness and creativity of the model's completion. It influences the internal algorithms that determine predicted token probabilities.
- Lower Values (e.g., 0.0 - 0.4): The model becomes more deterministic, focused, and conservative. It will almost always choose the most probable next token. Use this for tasks requiring factual accuracy, code generation, or classification.
- Higher Values (e.g., 1.0 - 1.5+): The model takes more risks, choosing less probable tokens. This leads to more creative, diverse, and sometimes unexpected outputs. Use this for creative writing, brainstorming, or generating novel ideas.
Analogy: A temperature of 0.0 is like an accountant; a temperature of 1.5 is like a poet.
2. MaxTokens: Controlling Length
- Type:
int/int?(nullable integer) - Value: A positive integer representing the limit.
MaxTokens sets a hard limit on the maximum number of tokens the model generates in its completion.
This is crucial for two reasons:
- Cost Control: Preventing the model from generating exceptionally long, expensive responses.
- Output Management: Ensuring responses fit within UI constraints.
- Example: If set to
50, the model stops generating text instantly after producing the 50th token, even if it is in the middle of a sentence.
3. ChatSystemPrompt: Defining Persona
- Type:
string - Default Value: Usually generic, like "Assistant is a large language model."
When using chat completion models (like gpt-3.5-turbo or gpt-4), the ChatSystemPrompt is essential. It sets the "system message," which defines the AI's persona, role, and fundamental rules of engagement.
Unlike the user prompt, which carries the specific task, the system prompt defines who the AI is while performing that task.
- Example Use:
- Standard: "You are a helpful AI assistant."
- Specific: "You are an expert historian specializing in the Roman Empire. Provide detailed, academically rigorous answers."
- Behavioral: "You are a JSON converter. Output only raw JSON without any markdown formatting or explanations."
4. User: Tracking End-Users
- Type:
string - Value: A unique identifier string.
The User property allows you to pass a unique identifier representing your end-user to OpenAI. This does not change the AI's output directly, but it is vital for enterprise applications. It helps OpenAI monitor usage patterns, detect abuse, and assist with debugging issues linked to specific user sessions.
5. Logprobs and TopLogprobs: Inspecting Confidence
AI generation is historically non-deterministic. Even without changing a prompt, you might get different answers because internal algorithms select based on probabilities.
These two settings are designed for debugging and analyzing the model's "confidence" in its choices.
Logprobs
- Type:
bool/bool? - Default Value:
false
Setting Logprobs = true instructs the model to return metadata about the logarithmic probabilities of the tokens it generated. It essentially asks the AI, "Show me the math behind why you chose these words."
When enabled, the response metadata will contain ContentTokenLogProbabilities, revealing how strongly the model felt about its chosen token versus alternatives.
TopLogprobs
- Type:
int/int? - Value: An integer specifying how many alternatives to show.
If Logprobs is true, TopLogprobs defines how many alternative token choices you want to see for each generated token.
- Example: If set to
5, for every word the AI generates, it will report the top 5 words it considered choosing at that specific step, along with their respective probabilities.