Concept

Inference-Time Scaling

Inference-time scaling is a strategy for enhancing the performance of Large Language Models during their application phase, which notably does not involve any parameter updates or further training. This approach is distinct from pre-training and fine-tuning scaling. It encompasses a wide array of methods that scale LLMs across various dimensions, including techniques like ensembling multiple model outputs, expanding the context length, employing more aggressive decoding algorithms, and leveraging external tools to augment the model's inherent capabilities.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.5 Inference - Foundations of Large Language Models

Related