Beam search
Beam search, also referred to as beam decoding, is an algorithm that extends the greedy search approach by exploring multiple potential paths simultaneously. Rather than committing to the single most probable token at each generation step, beam search retains a predetermined number of the best candidate sequences, a parameter known as the beam width. This strategy helps mitigate the problem of high-quality sequences being discarded too early in the generation process.
0
1
Contributors are:
Who are from:
References
Speech and Language Processing (3rd ed. draft)
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Related
Beam search
Auto-regressive Decoding in Machine Translation
An autoregressive sequence generation model is tasked with producing an output. At each step, it calculates the probability for every possible next element and selects the single element with the highest probability before moving to the next step. What is the primary limitation of this step-by-step selection strategy?
Decoder Input Analysis
Diagnosing Translation Degradation
Greedy Search (Greedy Decoding)
Beam search
A language model is generating a sequence of tokens. The total log-probability for the partially generated sequence 'The quick brown' has been calculated as -3.5. In the very next step, the model computes the conditional log-probability for the token 'fox' as -1.2. What is the new total log-probability for the complete sequence 'The quick brown fox'?
A language model is generating a sequence. The table below shows the conditional log-probability for each new token and the claimed total accumulated log-probability for the sequence up to that point. Analyze the table to identify the first step where the total accumulated log-probability is calculated incorrectly based on the principle of incremental summation.
Step Token Conditional log-prob Total Accumulated log-prob 1 'The' -0.9 -0.9 2 'cat' -1.5 -2.4 3 'sat' -1.1 -2.6 Comparing Generation Paths
Mathematical Justification for Greedy Search
Construction of the Optimal Sequence in Greedy Search
Candidate Set in Greedy Search
A language model is generating a two-token sequence. At the first step, it calculates the probability for the next token: 'Token A' has a probability of 0.6, and 'Token B' has a probability of 0.4. If the model chooses 'Token A', the most probable subsequent token is 'Token C' (with a conditional probability of 0.5). If the model had chosen 'Token B', the most probable subsequent token would be 'Token D' (with a conditional probability of 0.9). A text generation algorithm is used that, at every step, commits to the single token with the highest immediate probability. Based on this process, which sequence will be generated and why?
Algorithm Suitability for Text Generation Tasks
When generating a sequence of text, an algorithm that selects the single most probable token at each step is guaranteed to produce the overall most probable sequence.
Analyzing Suboptimal Outcomes in Text Generation
Selecting and Justifying a Decoding Policy for Two Production Use Cases
Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product
Post-incident analysis: fixing repetition and truncation by tuning decoding
Choosing a Decoding Configuration Under Latency, Diversity, and Length Constraints
Release-readiness decision: decoding configuration for a customer-facing summarization feature
Decoding policy decision for a multilingual support assistant under safety, latency, and verbosity constraints
You are tuning decoding for an internal "meeting-n...
You’re implementing an LLM feature that generates ...
You’re building an internal “RFP response drafter”...
You’re deploying an LLM to draft customer-facing i...
Beam search
Optimizing Creative Text Generation
A machine learning engineer is attempting to improve the quality of summaries generated by a large language model. Their strategy is to drastically expand the number of potential summary sequences the model considers before selecting the final output. Which of the following statements best evaluates the primary trade-off of this approach?
Improving Complex Reasoning in LLMs
In the context of generating output from a language model, broadening the search for potential sequences is a guaranteed method to improve the factual accuracy of the final output, assuming sufficient computational resources are available.
Beam search
Learn After
Beam Width (K)
Top-K Token Selection in Beam Search
A text generation model is creating a sequence of words. It uses a search process that keeps track of the 2 most probable sequences at each step. The score for a sequence is the sum of the log-probabilities of its words. Given the state of the search below, which two sequences will be kept for the next step?
Step 1: The initial two sequences being tracked are:
- Sequence 1: "The" (Score: -0.5)
- Sequence 2: "A" (Score: -0.9)
Step 2: The model calculates the log-probabilities for the next possible words for each sequence:
- Expanding "The":
- "cat": -0.8
- "dog": -1.1
- Expanding "A":
- "mouse": -0.2
- "lion": -1.5
Analyzing Search Algorithm Behavior
Diagnosing a Flaw in Sequence Generation
You are tuning decoding for an internal "meeting-n...
You’re deploying an LLM to draft customer-facing i...
You’re building an internal “RFP response drafter”...
You’re implementing an LLM feature that generates ...
Post-incident analysis: fixing repetition and truncation by tuning decoding
Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product
Selecting and Justifying a Decoding Policy for Two Production Use Cases
Choosing a Decoding Configuration Under Latency, Diversity, and Length Constraints
Release-readiness decision: decoding configuration for a customer-facing summarization feature
Decoding policy decision for a multilingual support assistant under safety, latency, and verbosity constraints