1Cademy - The Search Problem in LLM Inference

Learn Before

Next-Token Probability Calculation in Autoregressive Decoders

Concept

The Search Problem in LLM Inference

In Large Language Model (LLM) inference, beyond the model computation problem of calculating $\Pr(\mathbf{y}|\mathbf{x})$ , lies the search problem. This problem focuses on how to efficiently find the best output sequence $\hat{\mathbf{y}}$ for a given input sequence $\mathbf{x}$ (or the generated KV cache). A naive approach is exhaustive search, which considers every possible output sequence to select the one with the highest prediction probability. While this method guarantees a globally optimal solution, a direct exhaustive search is impractical for LLMs because the number of potential output sequences grows exponentially with the length of $\mathbf{y}$ .

0

1

Updated 2026-05-03

Contributors are:

Who are from:

References

Learn Before

Related

Learn After