Concept

Search Space Pruning in LLM Decoding

To manage the computationally infeasible size of the search space, practical decoding algorithms use pruning strategies. These methods work by identifying and discarding low-quality or unpromising sequences at each step of the generation process, thereby focusing computational effort on a smaller, more manageable set of high-potential candidates.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences