Learn Before
Analyzing Trade-offs in Deadline-Aware LLM Scheduling
An organization uses a large language model for two main tasks: generating complex financial reports with strict deadlines and answering real-time, interactive queries from business analysts. To ensure deadlines are met, they implement a custom scheduling policy that gives absolute priority to any report-generation request with a deadline approaching within the next hour. Analyze the potential negative consequences of this scheduling policy on the overall system performance and user experience. What specific issues might arise for the interactive queries?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating Scheduling Policies for a Multi-Tenant LLM Service
An inference serving system for a large language model must handle requests from two user tiers: 'Premium' users who pay for guaranteed low latency, and 'Standard' users. The system also runs internal, non-urgent 'Analytics' jobs that can tolerate high latency. The primary business goal is to retain Premium users by meeting their low-latency expectations, while still processing requests from other tiers. Which custom scheduling policy would be the most effective for achieving this business goal?
Analyzing Trade-offs in Deadline-Aware LLM Scheduling