1Cademy - Diagram of the LLM Inference Workflow

Learn Before

Request Processing Workflow in LLM Inference

Example

Diagram of the LLM Inference Workflow

This diagram illustrates the high-level workflow of an LLM inference system. It begins with a 'Request Pool' containing user inputs (e.g., x1, x2, x3). A 'Scheduler' selects requests from this pool and groups them into a 'batch'. This batch is then sent to the 'Inference Engine' for processing. Finally, the engine executes the model on the batch and returns the corresponding 'Predictions' (e.g., y2, y1, y3).

Updated 2025-10-09

Contributors are: