Formula

General Form of Memory-Based Attention

The attention operation at a specific position ii that utilizes a memory component to retain contextual information can be expressed in a general form. This operation computes attention using a query vector qi\mathbf{q}_i and a memory model Mem\mathrm{Mem}. In standard attention, this memory model Mem\mathrm{Mem} is defined as the complete Key-Value (KV) cache up to position ii, meaning Mem=(Ki,Vi)\mathrm{Mem} = (\mathbf{K}_{\le i}, \mathbf{V}_{\le i}). As a result, the size of Mem\mathrm{Mem} is determined directly by the sequence length ii. The general formula is: Att(qi,Mem)=Attqkv(qi,Ki,Vi)\mathrm{Att}(\mathbf{q}_i, \mathrm{Mem}) = \mathrm{Att}_{\mathrm{qkv}}(\mathbf{q}_i, \mathbf{K}_{\le i}, \mathbf{V}_{\le i}).

Image 0

0

1

Updated 2026-04-22

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences