Learn Before
LLM Inference Request Processing
After an inference system has completed a total of 12 computational iterations since the requests arrived, what is the status of each request? Specifically, state whether each request is complete and, if not, how many output tokens have been generated for it. Justify your reasoning based on how iterations are defined for the input processing and output generation phases.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A large language model using a continuous batching inference system processes a single request. The input prompt consists of 150 tokens, and the model is configured to generate an output of 200 tokens. How many computational iterations are required to fully process this single request?
LLM Inference Request Processing
In a continuous batching system for large language model inference, every single token processed, whether from the input prompt or the generated output, constitutes one separate computational iteration.