Applying Sliding Window Attention
Based on the scenario below, which words' corresponding key and value pairs would be included in the memory component for the attention calculation at this specific step? Explain your reasoning.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Formula for Fixed-Size Window Memory
Window-based Cache as an Example of Fixed-Size Memory
Space Complexity of Sliding Window Attention
Window Size (n_c)
A language model is designed to process extremely long sequences of text, and its developers are concerned about computational resources. They are considering two approaches for the attention mechanism: one that considers all previous tokens in the sequence, and another that only considers a fixed-size window of the 100 most recent tokens. What is the fundamental trade-off between these two approaches?
Applying Sliding Window Attention
In an attention mechanism that uses a fixed-size sliding window, the amount of memory required to store the keys and values for the attention calculation increases as the input sequence gets longer.
Your team is documenting the memory subsystem of a...
You are reviewing two candidate memory designs for...
You’re deploying an internal LLM assistant that mu...
You’re designing an internal LLM feature that moni...
Post-Incident Review: Memory Design for Long-Running Customer Support Chats
Diagnosing Long-Range Failures in a Segment-Processed LLM with Dual Memory
Choosing a Memory Architecture for Long-Context Enterprise Summarization
Postmortem: Long-Document QA Failures Under Fixed-Window vs Compressive Memory
Selecting and Justifying a Long-Context Memory Design for a Regulated Audit Assistant
Incident Triage: Long-Running Agent Workflow with Windowed vs Compressive Memory