1Cademy - In an autoregressive model, the attention output for a token is a weighted sum of the value vectors of itself and all preceding tokens. Consider a sequence of three tokens (at positions 0, 1, and 2). The value vectors are given as v_0 = [1, 2], v_1 = [3, 0], and v_2 = [4, 5]. The attention weights for the token at position 2, which determine the contribution of each token in the context, are α_2,0 = 0.1, α_2,1 = 0.6, and α_2,2 = 0.3. Based on this information, what is the attention output vector for the token at position 2?

Learn Before

Causal Attention Output for a Single Token

Multiple Choice

In an autoregressive model, the attention output for a token is a weighted sum of the value vectors of itself and all preceding tokens. Consider a sequence of three tokens (at positions 0, 1, and 2). The value vectors are given as v_0 = [1, 2], v_1 = [3, 0], and v_2 = [4, 5]. The attention weights for the token at position 2, which determine the contribution of each token in the context, are α_2,0 = 0.1, α_2,1 = 0.6, and α_2,2 = 0.3. Based on this information, what is the attention output vector for the token at position 2?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related