Construction of the Optimal Sequence in Greedy Search
In a greedy search algorithm, the single most optimal sequence at step i, denoted , is constructed by taking the sequence generated up to the previous step, , and appending the single most probable next token, . The formal representation of this construction is:

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Mathematical Justification for Greedy Search
Construction of the Optimal Sequence in Greedy Search
Candidate Set in Greedy Search
A language model is generating a two-token sequence. At the first step, it calculates the probability for the next token: 'Token A' has a probability of 0.6, and 'Token B' has a probability of 0.4. If the model chooses 'Token A', the most probable subsequent token is 'Token C' (with a conditional probability of 0.5). If the model had chosen 'Token B', the most probable subsequent token would be 'Token D' (with a conditional probability of 0.9). A text generation algorithm is used that, at every step, commits to the single token with the highest immediate probability. Based on this process, which sequence will be generated and why?
Algorithm Suitability for Text Generation Tasks
When generating a sequence of text, an algorithm that selects the single most probable token at each step is guaranteed to produce the overall most probable sequence.
Analyzing Suboptimal Outcomes in Text Generation
Selecting and Justifying a Decoding Policy for Two Production Use Cases
Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product
Post-incident analysis: fixing repetition and truncation by tuning decoding
Choosing a Decoding Configuration Under Latency, Diversity, and Length Constraints
Release-readiness decision: decoding configuration for a customer-facing summarization feature
Decoding policy decision for a multilingual support assistant under safety, latency, and verbosity constraints
You are tuning decoding for an internal "meeting-n...
You’re implementing an LLM feature that generates ...
You’re building an internal “RFP response drafter”...
You’re deploying an LLM to draft customer-facing i...
Beam search
Construction of the Optimal Sequence in Greedy Search
When generating text one token at a time, a greedy algorithm aims to select the token
y_iat stepithat maximizes the log-probability of the entire sequence up to that point,log Pr(y_1...y_i | x). This optimization problem can be simplified to choosing the token that maximizes only the conditional log-probability of the current token,log Pr(y_i | x, y_{<i}). Why is this simplification mathematically valid for finding the best current tokeny_i?The simplification of the greedy search objective relies on a specific mathematical derivation. Arrange the following steps to correctly represent the logical flow of this derivation, which shows how maximizing the log-probability of the entire sequence up to the current step (
log Pr(y_1...y_i | x)) is equivalent to maximizing the log-probability of just the current token (log Pr(y_i | x, y_{<i})).Explaining Greedy Search Optimization
Construction of the Optimal Sequence in Greedy Search
An autoregressive model generates the token sequence , where , , and so on. What does the notation represent in this specific sequence?
True or False: For an autoregressive model generating the output sequence , the notation represents the complete subsequence .
Formula for Constructing Top-K Candidate Sequences
An autoregressive language model is generating a sequence of tokens, one at a time. To predict the fifth token in the sequence, denoted as , the model uses all the previously generated tokens as context. The standard notation for this preceding subsequence of tokens is ____.
Learn After
Formula for the Candidate Set in Greedy Search
A language model is generating text one token at a time by always selecting the single most probable next token. It has already produced the sequence 'The sun is shining'. For the very next step, the model calculates the following conditional probabilities for the next token:
- P(brightly | 'The sun is shining') = 0.55
- P(today | 'The sun is shining') = 0.25
- P(and | 'The sun is shining') = 0.15
- P(down | 'The sun is shining') = 0.05
Based on this method of construction, what will the updated sequence be after this step?
A language model generates text by always appending the single most probable token given the sequence generated so far. Arrange the following steps to correctly illustrate how the model would construct the three-token sequence 'The quick fox'.
Analyzing a Sequence Construction Method