Learn Before
Value Vector
In an attention mechanism, the Value vector (V) contains the actual information or content of an input item. Once the attention scores are calculated by comparing the Query with the Keys, these scores are used as weights to compute a weighted average of all Value vectors. The result is the output of the attention layer.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Query (Attention)
Key (Attention)
Value (Attention)
State Function from Previous Outputs
Value Weight Matrix Formula
Set of Sequential Key-Value Pairs
Query Vector
Key Vector
Value Vector
Implicit Relative Position Modeling in Self-Attention with RoPE
Value Weight Matrix Definition ()
Imagine a system translating the sentence 'The quick brown fox jumps'. When the system is generating the output word corresponding to 'jumps', it needs to determine which words in the input sentence are most relevant. To do this, a vector representing the current translation context (i.e., 'what information do I need to produce the next word?') is compared against a set of searchable 'label' vectors, one for each word in the input sentence. This comparison generates a relevance score for each input word. Finally, a new vector is created by taking a weighted average of the 'content' vectors of the input words, using the relevance scores as weights. How do the three main vector types in this process correspond to their roles?
In a system designed to answer questions based on a provided document, the model first creates a representation of the user's question. It then compares this representation against a set of searchable representations, one for each sentence in the document, to determine relevance scores. Finally, it constructs an answer by creating a weighted combination of the informational content from each sentence, using the relevance scores as weights. Which option correctly assigns the roles of Query, Key, and Value vectors in this scenario?
Context Window of Key Vectors Notation
Key-Value Cache
In a computational mechanism designed to selectively focus on different parts of an input sequence, information is represented by three distinct types of vectors that interact to produce a context-aware output. Match each vector type to its specific role in this process.
Masked QKV Attention Formula
Learn After
In a simplified attention mechanism, the final output is a weighted average of content-carrying vectors from the input sequence. Suppose an input sequence has three items with the following content vectors and their corresponding calculated attention weights:
- Item 1: Vector = [2, 10], Weight = 0.2
- Item 2: Vector = [4, 0], Weight = 0.7
- Item 3: Vector = [8, 5], Weight = 0.1
What is the resulting output vector from this attention calculation?
Distinguishing Vector Roles in Attention
An engineer is debugging a self-attention layer in a language model. They observe that the attention scores (the weights assigned to each input) are being calculated correctly. However, they decide to run an experiment where they keep the Query and Key vectors for each input token exactly the same, but they replace every Value vector with a vector of all zeros. Which of the following outcomes is the most certain result of this specific change?