1Cademy - An inference serving system for a large language model must handle requests from two user tiers: Premium users who pay for guaranteed low latency, and Standard users. The system also runs internal, non-urgent Analytics jobs that can tolerate high latency. The primary business goal is to retain Premium users by meeting their low-latency expectations, while still processing requests from other tiers. Which custom scheduling policy would be the most effective for achieving this business goal?

Learn Before

Custom Priority Policies in LLM Scheduling

Multiple Choice

An inference serving system for a large language model must handle requests from two user tiers: 'Premium' users who pay for guaranteed low latency, and 'Standard' users. The system also runs internal, non-urgent 'Analytics' jobs that can tolerate high latency. The primary business goal is to retain Premium users by meeting their low-latency expectations, while still processing requests from other tiers. Which custom scheduling policy would be the most effective for achieving this business goal?

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related