Mathematical Formulation of the Search Problem in LLM Inference
The search problem for Large Language Model (LLM) inference can be mathematically re-expressed as finding the optimal output sequence, , from the entire search space that maximizes the conditional probability given the input sequence . This is formally defined by the equation:

0
1
References
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Hypothesis in LLM Inference
Mathematical Formulation of the Search Problem in LLM Inference
Exploration vs. Exploitation in LLM Search
Search Tree Structure in Token Generation
Heuristic Search Algorithms for LLM Inference
Efficient Generation of Candidate Solutions via Search Algorithms
Search for Optimal or Sub-optimal Sequences in LLM Inference
Root of the Search Space as a Representation of Input (x)
A text generation model has a vocabulary of 10,000 possible words it can choose from for each position in a sequence. If this model were to find the optimal output by evaluating every single possible sequence, how would the total number of sequences to check change if the desired output length is increased from 3 words to 5 words?
Evaluating an Inference Strategy
The Impracticality of Exhaustive Search
Historical Context and Computational Challenges of Maximum Probability Prediction
Mathematical Representation of an Output Sequence
Formula for the Search Space as a Union of Complete Sequences
Formula for the Expansion of the Search Space at Each Step
A simplified language model has a vocabulary consisting of only three unique tokens: 'cat', 'sat', and 'on'. The model is configured to generate an output sequence with a fixed length of exactly two tokens. Which of the following options correctly represents the complete set of all possible output sequences the model can generate?
Analyzing Search Space Dimensions
Growth of the Generative Search Space
Mathematical Formulation of the Search Problem in LLM Inference
Learn After
A language model is generating a response based on a user's input. For this input, the model can generate many different possible sequences of words. The model's core task is to select the single best sequence from all these possibilities. According to the mathematical objective that governs this selection, which principle should the model follow?
Autoregressive Decomposition of the LLM Inference Objective
Optimal Sequence Selection
Search for Optimal Output Sequence in LLMs
Interpreting the LLM Search Objective