Concept

Memory-Based Attention as a Form of Internal Memory

As an alternative to efficient attention methods like sparse or linear attention, the context from preceding tokens can be explicitly encoded using an additional memory model. In this approach, a memory component, denoted as Mem, is used to represent and retain the contextual information from the keys and values, often in a fixed-size format. This strategy aims to manage the growing Key-Value (KV) cache as inference proceeds.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences