Multiple Choice

In a specific type of attention mechanism, the history of key-value pairs up to a position i is summarized by two state variables: a matrix μ_i and a vector ν_i. They are defined as cumulative sums:

μ_i = Σ_{j=0 to i} (k'_jᵀ * v_j) (sum of outer products) ν_i = Σ_{j=0 to i} (k'_jᵀ) (sum of transformed key vectors)

Suppose you have already computed the state variables μ_i and ν_i for a sequence up to position i. To compute the next state variables, μ_{i+1} and ν_{i+1}, what is the only additional information you need?

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science