1Cademy - A language model is calculating an output vector using a sparse attention mechanism. The computation for the current token only considers a subset of previous tokens, identified by the index set G = {0, 2, 3}. Given the value vectors and corresponding attention weights below, what is the correct output vector? Value Vectors: - v_0 = [2, 1] - v_1 = [4, 5] - v_2 = [6, 0] - v_3 = [1, 3] Attention Weights for the included set G: - α_0 = 0.5 - α_2 = 0.2 - α

Learn Before

Sparse Attention Output Formula

Multiple Choice

A language model is calculating an output vector using a sparse attention mechanism. The computation for the current token only considers a subset of previous tokens, identified by the index set G = {0, 2, 3}. Given the value vectors and corresponding attention weights below, what is the correct output vector?

Value Vectors:

v_0 = [2, 1]
v_1 = [4, 5]
v_2 = [6, 0]
v_3 = [1, 3]

Attention Weights for the included set G:

α'_0 = 0.5
α'_2 = 0.2
α'_3 = 0.3

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related