Example

Diagram of the LLM Inference Workflow

This diagram illustrates the high-level workflow of an LLM inference system. It begins with a 'Request Pool' containing user inputs (e.g., x1, x2, x3). A 'Scheduler' selects requests from this pool and groups them into a 'batch'. This batch is then sent to the 'Inference Engine' for processing. Finally, the engine executes the model on the batch and returns the corresponding 'Predictions' (e.g., y2, y1, y3).

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences