Formula

Attention Formula in Compressive Transformer

In a multi-memory architecture like the Compressive Transformer, the attention function operates over a unified memory space. To calculate the attention for a specific query qi\mathbf{q}_i, the standard query-key-value mechanism is applied to the concatenation of the local memory (Mem\mathrm{Mem}) and the compressive memory (CMem\mathrm{CMem}). This relationship is mathematically expressed as: Attcom(qi,Mem,CMem)=Attqkv(qi,[Mem,CMem])\mathrm{Att}_{\mathrm{com}}(\mathbf{q}_i, \mathrm{Mem}, \mathrm{CMem}) = \mathrm{Att}_{\mathrm{qkv}}(\mathbf{q}_i, [\mathrm{Mem}, \mathrm{CMem}])

Image 0

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related