Beam Search Strategy in Sequence-to-Sequence Models
Beam search is a sequence decoding strategy that strikes a compromise between the computational efficiency of greedy search and the optimality of exhaustive search. Instead of greedily picking the single most likely token or exploring all possible paths, beam search evaluates and retains a predetermined number of the most promising candidate sequences at each step of the generation process.
0
1
Contributors are:
Who are from:
References
Speech and Language Processing (3rd ed. draft)
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Dive into Deep Learning
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
D2L
Dive into Deep Learning @ D2L
Related
Auto-regressive Decoding in Machine Translation
An autoregressive sequence generation model is tasked with producing an output. At each step, it calculates the probability for every possible next element and selects the single element with the highest probability before moving to the next step. What is the primary limitation of this step-by-step selection strategy?
Decoder Input Analysis
Diagnosing Translation Degradation
Teacher Forcing
Beam Search Strategy in Sequence-to-Sequence Models
Greedy Search (Greedy Decoding)
A language model is generating a sequence of tokens. The total log-probability for the partially generated sequence 'The quick brown' has been calculated as -3.5. In the very next step, the model computes the conditional log-probability for the token 'fox' as -1.2. What is the new total log-probability for the complete sequence 'The quick brown fox'?
A language model is generating a sequence. The table below shows the conditional log-probability for each new token and the claimed total accumulated log-probability for the sequence up to that point. Analyze the table to identify the first step where the total accumulated log-probability is calculated incorrectly based on the principle of incremental summation.
Step Token Conditional log-prob Total Accumulated log-prob 1 'The' -0.9 -0.9 2 'cat' -1.5 -2.4 3 'sat' -1.1 -2.6 Comparing Generation Paths
Beam Search Strategy in Sequence-to-Sequence Models
Optimizing Creative Text Generation
A machine learning engineer is attempting to improve the quality of summaries generated by a large language model. Their strategy is to drastically expand the number of potential summary sequences the model considers before selecting the final output. Which of the following statements best evaluates the primary trade-off of this approach?
Improving Complex Reasoning in LLMs
In the context of generating output from a language model, broadening the search for potential sequences is a guaranteed method to improve the factual accuracy of the final output, assuming sufficient computational resources are available.
Beam Search Strategy in Sequence-to-Sequence Models
Computational Cost of Greedy Search in Sequence-to-Sequence Models
Greedy Search as a Special Case of Beam Search
Beam Search Strategy in Sequence-to-Sequence Models
Computational Cost of Exhaustive Search in Sequence-to-Sequence Models
Beam Search Strategy in Sequence-to-Sequence Models
Learn After
Top-K Token Selection in Beam Search
A text generation model is creating a sequence of words. It uses a search process that keeps track of the 2 most probable sequences at each step. The score for a sequence is the sum of the log-probabilities of its words. Given the state of the search below, which two sequences will be kept for the next step?
Step 1: The initial two sequences being tracked are:
- Sequence 1: "The" (Score: -0.5)
- Sequence 2: "A" (Score: -0.9)
Step 2: The model calculates the log-probabilities for the next possible words for each sequence:
- Expanding "The":
- "cat": -0.8
- "dog": -1.1
- Expanding "A":
- "mouse": -0.2
- "lion": -1.5
Analyzing Search Algorithm Behavior
Diagnosing a Flaw in Sequence Generation
You are tuning decoding for an internal "meeting-n...
You’re deploying an LLM to draft customer-facing i...
You’re building an internal “RFP response drafter”...
You’re implementing an LLM feature that generates ...
Post-incident analysis: fixing repetition and truncation by tuning decoding
Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product
Selecting and Justifying a Decoding Policy for Two Production Use Cases
Choosing a Decoding Configuration Under Latency, Diversity, and Length Constraints
Release-readiness decision: decoding configuration for a customer-facing summarization feature
Decoding policy decision for a multilingual support assistant under safety, latency, and verbosity constraints
Computational Cost of Beam Search in Sequence-to-Sequence Models
Beam Size in Beam Search