1Cademy - When generating text one token at a time, a greedy algorithm aims to select the token `y_i` at step `i` that maximizes the log-probability of the entire sequence up to that point, `log Pr(y_1...y_i | x)`. This optimization problem can be simplified to choosing the token that maximizes only the conditional log-probability of the current token, `log Pr(y_i | x, y_{<i})`. Why is this simplification mathematically valid for finding the best current token `y

Learn Before

Mathematical Justification for Greedy Search

Multiple Choice

When generating text one token at a time, a greedy algorithm aims to select the token y_i at step i that maximizes the log-probability of the entire sequence up to that point, log Pr(y_1...y_i | x). This optimization problem can be simplified to choosing the token that maximizes only the conditional log-probability of the current token, log Pr(y_i | x, y_{<i}). Why is this simplification mathematically valid for finding the best current token y_i?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related