Learn Before
Diagram of the LLM Inference Workflow
This diagram illustrates the high-level workflow of an LLM inference system. It begins with a 'Request Pool' containing user inputs (e.g., x1, x2, x3). A 'Scheduler' selects requests from this pool and groups them into a 'batch'. This batch is then sent to the 'Inference Engine' for processing. Finally, the engine executes the model on the batch and returns the corresponding 'Predictions' (e.g., y2, y1, y3).
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
An operations team monitors an LLM inference system and notices that the hardware responsible for model execution is consistently underutilized, even when there is a continuous stream of user requests waiting to be processed. This leads to lower-than-expected overall system throughput. In a standard workflow where requests are grouped into batches by a scheduler before being processed, what is the most probable explanation for this specific performance issue?
Arrange the following stages of a typical request processing workflow in a Large Language Model (LLM) inference system into the correct chronological order, from the initial arrival of a request to the final output.
Diagram of the LLM Inference Workflow
LLM Inference Scheduling Strategy
Learn After
A team is building a system to generate text predictions from user inputs. They have designed the following process: First, an 'Inference Engine' processes a group of user inputs. Next, a 'Scheduler' organizes the resulting predictions. Finally, these organized predictions are delivered to the users. Based on the standard processing workflow for such systems, what is the primary logical flaw in this team's design?
A system is designed to generate predictions from user inputs. Arrange the following stages of its processing workflow into the correct chronological order, from receiving an input to producing an output.
Diagnosing a Performance Bottleneck in a Text Generation System