Learn Before
A large language model using a continuous batching inference system processes a single request. The input prompt consists of 150 tokens, and the model is configured to generate an output of 200 tokens. How many computational iterations are required to fully process this single request?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A large language model using a continuous batching inference system processes a single request. The input prompt consists of 150 tokens, and the model is configured to generate an output of 200 tokens. How many computational iterations are required to fully process this single request?
LLM Inference Request Processing
In a continuous batching system for large language model inference, every single token processed, whether from the input prompt or the generated output, constitutes one separate computational iteration.