Short Answer

Resource Reallocation in Dynamic Batching

An inference engine is processing a batch containing four active sequences (A, B, C, and D). After one processing iteration, sequence B generates its end-of-sequence token. Describe the two primary changes to the system's state that the scheduler will implement before the next iteration begins, and explain the direct benefit of these changes for overall system efficiency.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science