Activity (Process)

Request Processing Workflow in LLM Inference

In a typical LLM inference system, the process begins with a pool of incoming user requests. A scheduler manages this pool, grouping individual requests into a batch. This batch is then dispatched to the inference engine, which executes the model to process the entire batch and produce the corresponding predictions.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences