1Cademy - In a specific type of attention mechanism, the history of key-value pairs up to a position `i` is summarized by two state variables: a matrix `μ_i` and a vector `ν_i`. They are defined as cumulative sums: `μ_i = Σ_{j=0 to i} (k_jᵀ * v_j)` (sum of outer products) `ν_i = Σ_{j=0 to i} (k_jᵀ)` (sum of transformed key vectors) Suppose you have already computed the state variables `μ_i` and `ν_i` for a sequence up to position `i`. To compute the next state variables, `μ_{i+1}` and `ν_{i+1}`, what is the *only* additional information you need?

Learn Before

State Variables in Linear Attention (μ_i, ν_i)

Multiple Choice

In a specific type of attention mechanism, the history of key-value pairs up to a position i is summarized by two state variables: a matrix μ_i and a vector ν_i. They are defined as cumulative sums:

μ_i = Σ_{j=0 to i} (k'_jᵀ * v_j) (sum of outer products) ν_i = Σ_{j=0 to i} (k'_jᵀ) (sum of transformed key vectors)

Suppose you have already computed the state variables μ_i and ν_i for a sequence up to position i. To compute the next state variables, μ_{i+1} and ν_{i+1}, what is the only additional information you need?