Learn Before
Analyzing Search Algorithm Behavior
Imagine a text generation model is producing a sequence. At the first step, the most probable next word is 'apple' with a log-probability of -0.8. The second most probable word is 'apricot' with a log-probability of -0.9. A simple greedy approach would select 'apple'. However, the best complete sequence actually starts with 'apricot'. Explain, in detail, the mechanism by which a search process that keeps track of multiple hypotheses at each step could arrive at the better overall sequence, even though it did not start with the most probable first word.
0
1
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Beam Width (K)
Top-K Token Selection in Beam Search
A text generation model is creating a sequence of words. It uses a search process that keeps track of the 2 most probable sequences at each step. The score for a sequence is the sum of the log-probabilities of its words. Given the state of the search below, which two sequences will be kept for the next step?
Step 1: The initial two sequences being tracked are:
- Sequence 1: "The" (Score: -0.5)
- Sequence 2: "A" (Score: -0.9)
Step 2: The model calculates the log-probabilities for the next possible words for each sequence:
- Expanding "The":
- "cat": -0.8
- "dog": -1.1
- Expanding "A":
- "mouse": -0.2
- "lion": -1.5
Analyzing Search Algorithm Behavior
Diagnosing a Flaw in Sequence Generation
You are tuning decoding for an internal "meeting-n...
Youâre deploying an LLM to draft customer-facing i...
Youâre building an internal âRFP response drafterâ...
Youâre implementing an LLM feature that generates ...
Post-incident analysis: fixing repetition and truncation by tuning decoding
Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product
Selecting and Justifying a Decoding Policy for Two Production Use Cases
Choosing a Decoding Configuration Under Latency, Diversity, and Length Constraints
Release-readiness decision: decoding configuration for a customer-facing summarization feature
Decoding policy decision for a multilingual support assistant under safety, latency, and verbosity constraints