Multiple Choice

When generating text one token at a time, a greedy algorithm aims to select the token y_i at step i that maximizes the log-probability of the entire sequence up to that point, log Pr(y_1...y_i | x). This optimization problem can be simplified to choosing the token that maximizes only the conditional log-probability of the current token, log Pr(y_i | x, y_{<i}). Why is this simplification mathematically valid for finding the best current token y_i?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science