1Cademy - Decoding as a Search Process in LLMs

Learn Before

Core Topics in LLM Development and Scaling

Concept

Decoding as a Search Process in LLMs

In Large Language Models, decoding is treated as a search problem where the goal is to efficiently identify the best possible output sequence for a given input.

Updated 2025-10-07

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Fundamental LLM Training Objective
Diverse and Combined Data Sources for LLM Pre-training
Traditional View on Diminishing Returns from Scaling
Text Generation Probability
Two Primary Approaches to Scaling LLMs
Scaling Laws as a Fundamental Principle in LLM Development
Decoding as a Search Process in LLMs
The Virtuous Cycle of Scaling in Language Models
Computational Infeasibility of Standard Transformers for Long Sequences
LLM Scaling Strategy for a New Application
Comparison of Traditional vs. Modern Views on LLM Scaling
Modern View on Continued Performance Gains from Scaling
Mathematical Notation for Text Generation Probability
A research team is developing a large language model designed to analyze and summarize entire novels in a single pass. Based on the core principles of scaling these models, what is the primary architectural challenge they must overcome?
A development team is building a large-scale language model and has a fixed budget for the computational resources required for training. They observe that their current model, which has a moderately complex architecture, stops improving its performance even when they continue training it on their existing large dataset. To achieve a significant leap in the model's capabilities, which of the following approaches represents the most effective use of their limited computational budget?
A leading AI research lab is deciding between two major projects for their next-generation language model.
- Project Alpha: Aims to train a model on a dataset ten times larger than any previously used, using a well-established architecture that has known limitations with very long text inputs.
- Project Beta: Aims to develop a novel model architecture capable of processing entire books as a single input, but due to the experimental nature and computational cost of this new design, i

Learn After

Decoder
A language model is generating a sentence and considers two different methods for choosing the sequence of words:
- Method A: At each step, the model selects the single most probable word and adds it to the sequence before moving to the next step.
- Method B: At each step, the model keeps track of the three most probable partial sentences generated so far, extends each of them with their most likely next words, and then keeps the three best resulting sentences to continue the process.
Inferring Search Strategy from LLM Output
Explaining the Search Problem in Text Generation

Learn Before

Related

Learn After