Scheduling Overhead in LLM Inference
An LLM inference system is modified to process long user prompts. Instead of handling each prompt as a single, monolithic computational task, the system now divides each prompt into several smaller, sequential segments. Explain why this modification increases the computational overhead specifically for the system's task scheduler.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
LLM Inference System Performance Diagnosis
An LLM inference system is reconfigured to handle long input sequences. Instead of processing the entire sequence in one large, parallel operation, it is broken down into smaller segments that are processed sequentially. This allows shorter, high-priority tasks to be interleaved. What is the most direct consequence of this change for the system's task scheduler?
Scheduling Overhead in LLM Inference