Learn Before
Trade-off between Search Quality and Computational Efficiency in Heuristic Search
Heuristic search algorithms used in LLM inference, such as greedy search and sampling-based methods, inherently involve a compromise between the quality of the generated output and the computational resources required. These methods are designed to approximate the optimal solution, and the specific approach chosen dictates the balance between achieving a high-quality result and maintaining computational efficiency.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Sampling-Based Search for LLM Inference
Sequence Evaluation using Log-Probability
Deterministic Decoding Algorithms
Modifying the Search Objective to Improve Decoding
Maximum a Posteriori (MAP) Decoding
Speculative Decoding
Structured Search in Decoding
Trade-off between Search Quality and Computational Efficiency in Heuristic Search
An engineer is building a real-time chatbot that must respond to user queries very quickly. To achieve this speed, the engineer implements a text generation strategy that, at each step of forming a response, considers only a small subset of the most likely next words instead of all possible words in the vocabulary. What is the fundamental trade-off inherent in this design choice?
Evaluating a Decoding Algorithm Claim
Analysis of Competing Text Generation Systems
Learn After
Chatbot Configuration Decision
An engineer is configuring a text generation system that produces sentences one word at a time. They adjust a setting that significantly increases the number of potential next words the system considers at each step before making a choice. Which of the following outcomes is the most likely consequence of this change?
Match each text generation strategy with its most accurate description regarding the balance between output quality and computational cost.