In a self-attention layer processing an input sequence of two tokens, let the input vector for the first token be x_1 and for the second token be x_2. The layer generates a query vector q_1 (for the first token) and a key vector k_2 (for the second token). Which statement accurately describes the relationship between these inputs and generated vectors?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Single-Step Generation with a KV Cache
Updating the KV Cache
In a self-attention layer processing an input sequence of two tokens, let the input vector for the first token be
x_1and for the second token bex_2. The layer generates a query vectorq_1(for the first token) and a key vectork_2(for the second token). Which statement accurately describes the relationship between these inputs and generated vectors?Correcting a Misconception in Vector Generation
Calculating a Query Vector in Self-Attention
In a standard self-attention mechanism, an input vector is transformed into three separate vectors (Query, Key, and Value) using three distinct, learned weight matrices. Imagine a modified self-attention layer where these three weight matrices are constrained to be identical. What would be the most direct consequence of this change?