Concept

The Search Problem in LLM Inference

In Large Language Model (LLM) inference, beyond the model computation problem of calculating Pr(yx)\Pr(\mathbf{y}|\mathbf{x}), lies the search problem. This problem focuses on how to efficiently find the best output sequence y^\hat{\mathbf{y}} for a given input sequence x\mathbf{x} (or the generated KV cache). A naive approach is exhaustive search, which considers every possible output sequence to select the one with the highest prediction probability. While this method guarantees a globally optimal solution, a direct exhaustive search is impractical for LLMs because the number of potential output sequences grows exponentially with the length of y\mathbf{y}.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences