Set of Indexed Key-Value Pairs
A set representing a sequence of Key-Value pairs, indexed from 1 to . Each element in the set is a tuple , where is the index (e.g., representing a layer or head). The bold letters and typically denote Key and Value matrices or vectors, and the subscript indicates that these keys and values are for all positions up to and including in a sequence. The full expression is given by: .

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Set of Indexed Key-Value Pairs
A system component processes data using two distinct operations. The first operation, identified as 'op_alpha', results in the output 'tensor_1'. The second operation, identified as 'op_beta', results in the output 'tensor_2'. Which of the following correctly represents this information as a set of Key-Value pairs, where the operation identifier is the Key (K) and the resulting output is the Value (V)?
Data Structure for Activation Function Outputs
Analyzing a Flawed Key-Value Representation
Set of Indexed Key-Value Pairs
Set of Superscript-Indexed Vectors
Set of Key-Value Pairs
Function of a Sequence of Overlined Variables
Function of a Sequence of Averaged Vectors
Vector Slice Notation for a Sequence Window ()
Set of Sequential Vectors Notation
Vector Sequence Window Notation
Consider an autoregressive model generating a sequence of tokens one by one. At each step
i, the model calculates attention using the query from the current token and the keys and values from all tokens generated so far (from position 1 toi). To optimize this process, the model maintains a growing set of all previously computed key and value vectors. What is the primary computational advantage of this strategy?State of an Autoregressive Cache
An autoregressive language model with
τparallel computational units (e.g., attention heads) is generating a sequence of tokens. After computing the output for the 3rd token, the model stores the key and value vectors from all tokens processed so far to use in subsequent steps. Which of the following notations correctly represents the complete set of these stored key-value pairs at this specific moment?
Learn After
Attention Head Output with Grouped Queries and Causal Masking
Attention Head Output in Grouped-Query Attention (GQA)
A computational model processes sequences and, at a specific step
i, maintains a collection of data represented as:In this set, each is a pair of matrices, the subscript indicates that the matrices contain information for all sequence positions from the start up to position
i, and the superscript[t]is an index ranging from 1 toτ.Based on this structure, which statement provides the most accurate analysis of the collection?
Interpreting a Set of Indexed Key-Value Pairs
State of Key-Value Cache During Generation