Multiple Choice

In a memory-efficient attention mechanism, the output for a token at position i is calculated using the formula: Output = (q'_i * μ_i) / (q'_i * ν_i). In this formula, q'_i is the token's processed query, while μ_i and ν_i are aggregations of historical information from all tokens up to and including position i. Specifically, μ_i aggregates past key-value products, and ν_i aggregates past keys. What is the primary function of the denominator, q'_i * ν_i?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science