Search Space Pruning in LLM Decoding
To manage the computationally infeasible size of the search space, practical decoding algorithms use pruning strategies. These methods work by identifying and discarding low-quality or unpromising sequences at each step of the generation process, thereby focusing computational effort on a smaller, more manageable set of high-potential candidates.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Search Space Pruning in LLM Decoding
A language model with a vocabulary of 30,000 unique tokens is generating a response. If the model were to perform a complete, exhaustive search to find the absolute best possible 5-token sequence, which calculation represents the total number of unique sequences it would need to evaluate?
Evaluating a Decoding Strategy Proposal
Decoding Strategy Post-Mortem
Learn After
Greedy Search (Greedy Decoding)
Formula for Pruned Step-wise Expansion of the Hypothesis Set
A language model is generating a sentence and must decide on the next word. It has identified 100 possible words, each with an associated probability. To manage computational resources, the model employs a strategy that discards all but the top 5 most probable words before considering the subsequent step. Which of the following statements best analyzes the primary trade-off inherent in this strategy?
Analyzing Text Generation System Performance
Rationale for Decoding Heuristics