Short Answer

Analyzing a System State in Continuous Batching

An LLM inference engine using a continuous batching strategy is processing two separate requests simultaneously. After a recent computational step, the state is as follows:

  • Request 1 has generated the sequence: "The cat sat on"
  • Request 2 has generated the sequence: "Once upon a"

In the immediately preceding step, the generated sequences were "The cat sat" and "Once upon", respectively. No new requests have arrived, and neither request has finished generating.

Based on this change in state, describe the single computational operation that was just performed and explain why this operation is a key feature of the iterative decoding phase for a batch.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science