1Cademy - Context Scaling

Learn Before

Inference-Time Compute Scaling

Concept

Context Scaling

Context scaling is an inference-time compute scaling method that improves large language model performance by extending the input or context provided to the model. By incorporating more helpful context during inference, the model can condition its predictions on prior information. Approaches to context scaling include extending the prompt with input-output examples (few-shot prompting), encouraging intermediate reasoning steps (chain-of-thought prompting), and dynamically incorporating external knowledge from a database (Retrieval-Augmented Generation).

Updated 2026-05-06

Contributors are: