Learn Before
In an autoregressive Transformer model, generating a sequence in response to an input prompt involves two distinct phases from the perspective of the Key-Value (KV) cache. Which option correctly distinguishes the computational activities of these two phases?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Prefilling Phase in Transformer Inference
Computational Cost Comparison: Decoding vs. Prefilling
Decoding Phase in Transformer Inference
Analysis of KV Cache Utilization in Autoregressive Generation
In an autoregressive Transformer model, generating a sequence in response to an input prompt involves two distinct phases from the perspective of the Key-Value (KV) cache. Which option correctly distinguishes the computational activities of these two phases?
An autoregressive language model receives an input prompt and generates a response. From the perspective of how it uses its internal memory for past context (the Key-Value cache), arrange the following high-level stages of the generation process in the correct chronological order.