Inference System Optimization
Given the company's primary objective described in the case study, should they implement a decoding-prioritized strategy (which processes each batch of requests to full completion before starting the next) or a prefilling-prioritized strategy (which adds new requests to the batch as soon as any processing capacity becomes available)? Justify your choice by explaining the key trade-off involved.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Inference System Optimization
An AI development team is deploying two different services. Service X is a real-time conversational agent where minimizing the response time for each user's turn is the top priority. Service Y is an offline system that processes a massive queue of documents for analysis, where maximizing the total number of documents processed per day is the main goal. Considering the trade-offs between different batching methods, which approach is best suited for each service?
Match each batching strategy with its corresponding primary goal and performance trade-off.
Simultaneous vs. Sequential Phases in Continuous and Standard Batching