1Cademy - A causal model is calculating the output for the token at position `i=3`. The models attention mechanism is optimized to only consider a subset of previous positions. The set of contributing indices is `G = {0, 2}`. The attention weights for these indices are `α_3,0 = 0.6` and `α_3,2 = 0.4`. The value vectors for the relevant positions are: `v_0 = [1, 0]`, `v_1 = [2, 2]`, and `v_2 = [0, 3]`. Based on this information, what is the final output vector for position 3?

Learn Before

Sparse Attention Weights Assumption

Multiple Choice

A causal model is calculating the output for the token at position i=3. The model's attention mechanism is optimized to only consider a subset of previous positions. The set of contributing indices is G = {0, 2}. The attention weights for these indices are α_3,0 = 0.6 and α_3,2 = 0.4. The value vectors for the relevant positions are: v_0 = [1, 0], v_1 = [2, 2], and v_2 = [0, 3]. Based on this information, what is the final output vector for position 3?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related