1Cademy - An autoregressive language model calculates unnormalized scores (logits) for a set of candidate next tokens. These scores are then transformed into a probability distribution. What is the primary reason for applying an exponential function to each logit before the final normalization step?

Learn Before

Conditional Probability Formula for Autoregressive Models using Softmax

Multiple Choice

An autoregressive language model calculates unnormalized scores (logits) for a set of candidate next tokens. These scores are then transformed into a probability distribution. What is the primary reason for applying an exponential function to each logit before the final normalization step?

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences