Learn Before
Generation of Query, Key, and Value Vectors in Self-Attention
In a self-attention layer, the Query (Q), Key (K), and Value (V) vectors are not the direct inputs themselves but are generated through linear transformations of the same input sequence. This input is typically the output from the preceding layer. Each vector in the input sequence is multiplied by three distinct weight matrices (, , and ) to produce its corresponding Q, K, and V vectors.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Introduce weight matrices in the transformer
Generation of Query, Key, and Value Vectors in Self-Attention
In a self-attention mechanism, instead of directly comparing the raw input vectors of a sequence, each input vector is first multiplied by three separate, learned parameter matrices. This process creates three distinct representations of the original vector before they are used to calculate attention scores and output values. What is the primary analytical advantage of this approach over simply comparing the original input vectors to each other?
Learn After
Single-Step Generation with a KV Cache
Updating the KV Cache
In a self-attention layer processing an input sequence of two tokens, let the input vector for the first token be
x_1and for the second token bex_2. The layer generates a query vectorq_1(for the first token) and a key vectork_2(for the second token). Which statement accurately describes the relationship between these inputs and generated vectors?Correcting a Misconception in Vector Generation
Calculating a Query Vector in Self-Attention
In a standard self-attention mechanism, an input vector is transformed into three separate vectors (Query, Key, and Value) using three distinct, learned weight matrices. Imagine a modified self-attention layer where these three weight matrices are constrained to be identical. What would be the most direct consequence of this change?