Temperature-Scaled Softmax for Renormalized Probability
To control the randomness in token selection, the probability distribution can be reshaped using a temperature parameter, . The renormalized conditional probability of a token , given the context , is calculated by applying a temperature-scaled Softmax function to its logit, , and normalizing over a restricted set of candidate tokens . The formula is:

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Token Sampling from a Conditional Probability Distribution
Temperature-Scaled Softmax for Renormalized Probability
A language model has calculated the following raw scores (logits) for the next potential token:
{'mat': 3.0, 'rug': 2.5, 'chair': 2.0, 'moon': -1.0}. To control the randomness of the output, a temperature parameter is applied to these scores before they are converted into a final probability distribution for sampling. Which of the following probability distributions most likely resulted from applying a low temperature (e.g., a value less than 1.0)?Troubleshooting a Factual Chatbot's Output
You are configuring a text generation model for different tasks. Match each task with the description of the temperature setting that would be most appropriate to achieve the desired output.
Learn After
Token Sampling from a Conditional Probability Distribution
A language model is calculating the next token's probability distribution over a set of four candidate tokens. The raw output scores (logits) for these tokens are: {Token A: 4.0, Token B: 3.8, Token C: 1.5, Token D: 1.2}. The current generation process uses a temperature parameter
β = 1.0. A developer wants to modify the process to make the model's output less predictable and increase the likelihood of selecting Token B relative to Token A. Which of the following adjustments to the temperature parameterβwould best achieve this goal?Effect of Temperature on Probability Distributions
Parameter Tuning for Text Generation Tasks
You are tuning decoding for an internal "meeting-n...
You’re deploying an LLM to draft customer-facing i...
You’re building an internal “RFP response drafter”...
You’re implementing an LLM feature that generates ...
Post-incident analysis: fixing repetition and truncation by tuning decoding
Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product
Selecting and Justifying a Decoding Policy for Two Production Use Cases
Choosing a Decoding Configuration Under Latency, Diversity, and Length Constraints
Release-readiness decision: decoding configuration for a customer-facing summarization feature
Decoding policy decision for a multilingual support assistant under safety, latency, and verbosity constraints