Heuristic Search Algorithms for LLM Inference
Due to the computational infeasibility of an exhaustive search across an exponentially large space of sequences, LLM inference relies on heuristic algorithms. These practical methods use pruning strategies to navigate the vast output space efficiently. Pruning involves avoiding the exploration of low-quality or improbable sequences at each decoding step, which trades guaranteed optimality for a manageable computational cost. Common heuristic techniques include greedy search and sampling-based search.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Heuristic Search Algorithms for LLM Inference
Stopping Criteria in LLM Inference
Computational Infeasibility of Exhaustive Search in LLM Decoding
A language model is given the prompt 'The capital of France is'. Internally, the model's calculations show that the single most probable next word is 'Paris'. However, the model ultimately generates the sequence 'The capital of France is a beautiful city'. Which statement best analyzes the reason for this discrepancy?
The Challenge of Generating Optimal Text
Analyzing Text Generation Behavior
Hypothesis in LLM Inference
Mathematical Formulation of the Search Problem in LLM Inference
Exploration vs. Exploitation in LLM Search
Search Tree Structure in Token Generation
Heuristic Search Algorithms for LLM Inference
Efficient Generation of Candidate Solutions via Search Algorithms
Search for Optimal or Sub-optimal Sequences in LLM Inference
Root of the Search Space as a Representation of Input (x)
A text generation model has a vocabulary of 10,000 possible words it can choose from for each position in a sequence. If this model were to find the optimal output by evaluating every single possible sequence, how would the total number of sequences to check change if the desired output length is increased from 3 words to 5 words?
Evaluating an Inference Strategy
The Impracticality of Exhaustive Search
Historical Context and Computational Challenges of Maximum Probability Prediction
Mathematical Representation of an Output Sequence
Learn After
Sampling-Based Search for LLM Inference
Sequence Evaluation using Log-Probability
Deterministic Decoding Algorithms
Modifying the Search Objective to Improve Decoding
Maximum a Posteriori (MAP) Decoding
Speculative Decoding
Structured Search in Decoding
Trade-off between Search Quality and Computational Efficiency in Heuristic Search
An engineer is building a real-time chatbot that must respond to user queries very quickly. To achieve this speed, the engineer implements a text generation strategy that, at each step of forming a response, considers only a small subset of the most likely next words instead of all possible words in the vocabulary. What is the fundamental trade-off inherent in this design choice?
Evaluating a Decoding Algorithm Claim
Analysis of Competing Text Generation Systems