1Cademy - Attention Formula in Compressive Transformer

Learn Before

Compressive Transformer Memory Architecture

Formula

Attention Formula in Compressive Transformer

In a multi-memory architecture like the Compressive Transformer, the attention function operates over a unified memory space. To calculate the attention for a specific query $\mathbf{q}_i$ , the standard query-key-value mechanism is applied to the concatenation of the local memory ( $\mathrm{Mem}$ ) and the compressive memory ( $\mathrm{CMem}$ ). This relationship is mathematically expressed as: $\mathrm{Att}_{\mathrm{com}}(\mathbf{q}_i, \mathrm{Mem}, \mathrm{CMem}) = \mathrm{Att}_{\mathrm{qkv}}(\mathbf{q}_i, [\mathrm{Mem}, \mathrm{CMem}])$