Learn Before
When calculating the output for the token at position i=5 in a sequence using a causal attention mechanism, the value vector from position j=6 (v_6) is incorporated into the weighted sum.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An auto-regressive model is processing a sequence of 4 tokens. To compute the output for the token at position
i=2, it uses a causal attention mechanism. Given the value vectors and the calculated attention weights below, what is the resulting output vector for this position?Value Vectors:
v_0 = [1.0, 0.0]v_1 = [0.0, 2.0]v_2 = [3.0, 1.0]v_3 = [2.0, 2.0]
Attention Weights for position i=2:
- Weight for
v_0: 0.1 - Weight for
v_1: 0.3 - Weight for
v_2: 0.6
When calculating the output for the token at position
i=5in a sequence using a causal attention mechanism, the value vector from positionj=6(v_6) is incorporated into the weighted sum.Given the formula for the output of a causal attention mechanism for a specific query vector
q_i: Match each component of the formula to its correct description.