Formula for Causal Attention
The output of a causal attention mechanism for a specific query vector is calculated as a weighted sum of value vectors from all positions up to and including the current position . The formula is expressed as: Here, represents the attention weight assigned to the value vector at position when computing the output for position .

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Formula for Causal Attention
In a sequence processing model, the unnormalized attention score between a query at position
iand a key at positionjis calculated using the formula:Score(i, j) = (q_i ⋅ k_j + PE(i, j)) / √d. What is the primary function of thePE(i, j)term in this calculation?Analyzing Components of an Attention Score Formula
Diagnosing a Language Model's Performance Issue
Interpretation of Positional Bias as a Distance Penalty
Learn After
An auto-regressive model is processing a sequence of 4 tokens. To compute the output for the token at position
i=2, it uses a causal attention mechanism. Given the value vectors and the calculated attention weights below, what is the resulting output vector for this position?Value Vectors:
v_0 = [1.0, 0.0]v_1 = [0.0, 2.0]v_2 = [3.0, 1.0]v_3 = [2.0, 2.0]
Attention Weights for position i=2:
- Weight for
v_0: 0.1 - Weight for
v_1: 0.3 - Weight for
v_2: 0.6
When calculating the output for the token at position
i=5in a sequence using a causal attention mechanism, the value vector from positionj=6(v_6) is incorporated into the weighted sum.Given the formula for the output of a causal attention mechanism for a specific query vector
q_i: Match each component of the formula to its correct description.