Multiple Memory Models in Attention
The architecture of memory-based models can be extended to incorporate more than one memory component, motivated by the observation that both local and long-term contexts are valuable for attention models. This approach allows for a more sophisticated handling of context by using distinct memories to manage different types of information, such as separating short-term local context from a compressed summary of long-term historical data.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
General Form of Memory-Based Attention
Fixed-Size Memory for Constant Attention Cost
Multiple Memory Models in Attention
A language model is tasked with processing an extremely long document. How does an attention mechanism that uses a separate, fixed-size memory component to represent context differ from a standard attention mechanism in managing the information from the beginning of the document as it generates new text?
Managing Context in Long-Sequence Generation
Memory Models vs. Efficient Attention for Cache Optimization
Optimizing a Chatbot for Long Conversations
Notation for Key-Value Pairs
Architectural Strategies for Long-Context Processing
Learn After
Global Tokens in Attention
Compressive Transformer Memory Architecture
An engineer is designing a language model to process and answer questions about very long documents, such as legal contracts or novels. The model needs to understand the immediate context of a specific clause or sentence while also retaining key information and themes from the entire document. Which architectural approach is most suitable for this task?
Information Segregation in a Conversational AI
Architectural Rationale for Multi-Memory Models