Learn Before
Concept

Inference-Time Compute Scaling

Inference-time compute scaling, also known as test-time compute scaling, is a category of inference-time scaling methods that involve incorporating more computational resources during the inference phase to enhance model performance. Key categories of this scaling include Context Scaling (extending the input or context), Search Scaling (increasing computational effort during decoding), Output Ensembling (combining multiple model outputs), and Generating and Verifying Thinking Paths (guiding models to explicitly formulate and verify reasoning steps for complex problems).

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences