1Cademy - In a memory-efficient attention mechanism, the output for a token at position `i` is calculated using the formula: `Output = (q_i * μ_i) / (q_i * ν_i)`. In this formula, `q_i` is the tokens processed query, while `μ_i` and `ν_i` are aggregations of historical information from all tokens up to and including position `i`. Specifically, `μ_i` aggregates past key-value products, and `ν_i` aggregates past keys. What is the primary function of the denominator, `q_i * ν

Learn Before

Linear Causal Attention Formula

Multiple Choice

In a memory-efficient attention mechanism, the output for a token at position i is calculated using the formula: Output = (q'_i * μ_i) / (q'_i * ν_i). In this formula, q'_i is the token's processed query, while μ_i and ν_i are aggregations of historical information from all tokens up to and including position i. Specifically, μ_i aggregates past key-value products, and ν_i aggregates past keys. What is the primary function of the denominator, q'_i * ν_i?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related