Learn Before
Classification of Memory Models in LLMs
Memory models designed to address context length limitations in Large Language Models (LLMs) are broadly categorized into two types. Internal memories are integrated within the model and operate by updating the Key-Value (KV) cache. In contrast, external memories function as independent modules that access vast amounts of contextual information for the LLM.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Architectural Adaptation of LLMs for Long Sequences
Linear Attention
Classification of Memory Models in LLMs
Memory Models in LLMs as Context Encoders
PagedAttention for KV Cache Memory Optimization
Strategies for Mitigating KV Cache Memory Usage
A machine learning engineer is deploying a large language model and finds that the system frequently runs out of memory during inference. They are investigating two specific high-load scenarios, both of which involve processing a total of 16,000 tokens:
- Scenario X: Processing a batch of 32 user requests simultaneously, where each request has a context length of 500 tokens.
- Scenario Y: Processing a single user request that involves summarizing a very long document with a context length of 16,000 tokens.
Based on how attention states (keys and values) are managed during inference, which statement best analyzes the memory consumption issue?
Architectural Shift in LLMs due to Long-Sequence Limitations
Diagnosing Inference Failures with Long Documents
Analyzing Memory Constraints in Different LLM Applications
Learn After
Internal Memory in LLMs
External Memory for LLMs
A team of engineers is developing a system to help a language model process an entire book. Their approach involves storing the book's text in a separate, searchable vector database. When a user asks a question about the book, a retrieval mechanism first finds the most relevant paragraphs from the database and then provides only those paragraphs to the language model as context to generate an answer. How would this approach to managing long-term information be best classified?
You are evaluating different strategies designed to help a language model process information beyond its standard context window. Match each described strategy to the correct classification of memory model.
Architectural Design for a Knowledge-Base Chatbot