Learn Before
Establishing the Initial Context for Inference
The inference process within the prefilling-decoding framework begins by establishing an initial context, denoted as 'x'. This input sequence serves as the foundation for the prefilling phase, where its representation is computed and stored in the KV cache.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Search (Decoding) Algorithms for LLM Inference
Establishing the Initial Context for Inference
A user provides a large document (e.g., 2000 tokens) as input to a language model to generate a brief, 20-token answer. Considering the widely adopted two-phase framework for inference, which statement best distinguishes the computational characteristics of processing the initial document versus generating the answer?
Analysis of the Two-Phase Inference Framework
A user submits a prompt to a large language model. Arrange the following events in the correct chronological order as they would occur within the standard two-phase inference framework.
Learn After
Prefilling Phase in Transformer Inference
A user provides the following sequence of words to a large language model: 'Write a short story about a robot who discovers music.' In the model's text generation process, what is the primary role of this initial sequence of words?
Diagnosing Inference Latency
The Role of the Initial Input Sequence