Learn Before
Concept

Search Scaling (Decoding Scaling)

Search scaling, or decoding scaling, is an inference-time compute scaling strategy that improves large language model performance by expanding the search process during decoding to find the optimal output sequence. This approach involves two primary dimensions: scaling the output length (increasing the number of generated tokens) and scaling the search space (broadening the set of candidate output sequences considered).

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences