Multiple Choice

A large language model inference system is handling a mix of requests: many short, single-word generation tasks and a few long-input processing tasks. Initially, the system exhibits low overall throughput, with the short tasks experiencing significant delays. A modification is made to the system: instead of processing each long input in one large computational step, it is broken down and processed in a series of smaller, sequential steps. After this change, overall throughput increases and delays for short tasks are reduced. Which statement best analyzes why this modification was effective?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science