Learn Before
Top-K Token Selection in Beam Search
In the beam search algorithm, the process of expanding a hypothesis involves starting from a parent node, which represents a given prefix sequence (y1...yi−1), and then selecting the K most probable next tokens from the vocabulary. This step generates K new, longer candidate sequences to be considered in the next stage of the search.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Beam Width (K)
Top-K Token Selection in Beam Search
A text generation model is creating a sequence of words. It uses a search process that keeps track of the 2 most probable sequences at each step. The score for a sequence is the sum of the log-probabilities of its words. Given the state of the search below, which two sequences will be kept for the next step?
Step 1: The initial two sequences being tracked are:
- Sequence 1: "The" (Score: -0.5)
- Sequence 2: "A" (Score: -0.9)
Step 2: The model calculates the log-probabilities for the next possible words for each sequence:
- Expanding "The":
- "cat": -0.8
- "dog": -1.1
- Expanding "A":
- "mouse": -0.2
- "lion": -1.5
Analyzing Search Algorithm Behavior
Diagnosing a Flaw in Sequence Generation
You are tuning decoding for an internal "meeting-n...
You’re deploying an LLM to draft customer-facing i...
You’re building an internal “RFP response drafter”...
You’re implementing an LLM feature that generates ...
Post-incident analysis: fixing repetition and truncation by tuning decoding
Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product
Selecting and Justifying a Decoding Policy for Two Production Use Cases
Choosing a Decoding Configuration Under Latency, Diversity, and Length Constraints
Release-readiness decision: decoding configuration for a customer-facing summarization feature
Decoding policy decision for a multilingual support assistant under safety, latency, and verbosity constraints
Learn After
Construction of Top-K Candidate Sequences in Beam Search
Mathematical Definition of Top-K Token Selection
A language model is generating text using a search algorithm. At a certain step, it has the partial sequence 'The cat sat on the' and calculates the following probabilities for the next word from its vocabulary:
Word Probability mat 0.45 rug 0.25 chair 0.15 floor 0.10 table 0.03 window 0.02 If the algorithm is configured to select the 3 most probable next words at this step, which set of words will be chosen to create new candidate sequences?
Debugging a Text Generation System
A text generation system is designed to explore multiple possible sentence continuations at each step. It does this by selecting a fixed number of the most probable next words from its entire vocabulary. Match each parameter setting or concept with its most likely consequence or definition.