Multiple Choice

A large language model inference system is processing two user requests concurrently. Request 1 has a very long initial prompt that requires significant initial computation. Request 2 is already in the process of generating a response, producing one token at a time. The system's scheduler operates by breaking the initial computation for Request 1 into three smaller chunks. It processes the first chunk of Request 1, then generates one token for Request 2, then processes the second chunk of Request 1, then generates another token for Request 2, and so on. What is the primary advantage of this interleaved processing strategy?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science