1Cademy - General Form of Memory-Based Attention

Learn Before

Memory-Based Attention as a Form of Internal Memory

Formula

General Form of Memory-Based Attention

The attention operation at a specific position $i$ that utilizes a memory component to retain contextual information can be expressed in a general form. This operation computes attention using a query vector $\mathbf{q}_i$ and a memory model $\mathrm{Mem}$ . In standard attention, this memory model $\mathrm{Mem}$ is defined as the complete Key-Value (KV) cache up to position $i$ , meaning $\mathrm{Mem} = (\mathbf{K}_{\le i}, \mathbf{V}_{\le i})$ . As a result, the size of $\mathrm{Mem}$ is determined directly by the sequence length $i$ . The general formula is: $\mathrm{Att}(\mathbf{q}_i, \mathrm{Mem}) = \mathrm{Att}_{\mathrm{qkv}}(\mathbf{q}_i, \mathbf{K}_{\le i}, \mathbf{V}_{\le i})$ .

0

1

Updated 2026-04-22

Contributors are:

Who are from:

References

Learn Before

Related

Learn After