An autoregressive model is generating a sequence and has computed the following unnormalized scores (logits) for three candidate next tokens: Token A (3.0), Token B (1.0), and Token C (0.0). If a constant value of 10.0 is added to each of these three logits before the final probability normalization step, how will the resulting conditional probabilities for the tokens be affected?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Token Sampling from a Conditional Probability Distribution
Calculating Next-Token Probability
An autoregressive model is generating a sequence and has computed the following unnormalized scores (logits) for three candidate next tokens: Token A (3.0), Token B (1.0), and Token C (0.0). If a constant value of 10.0 is added to each of these three logits before the final probability normalization step, how will the resulting conditional probabilities for the tokens be affected?
An autoregressive language model calculates unnormalized scores (logits) for a set of candidate next tokens. These scores are then transformed into a probability distribution. What is the primary reason for applying an exponential function to each logit before the final normalization step?