Learn Before
Memory Management System Analysis
Analyze the two scenarios described in the case study. Which scenario (A or B) likely represents a system that does not use a memory allocation technique that divides the KV cache into smaller, fixed-size blocks? Justify your answer by explaining how the described memory allocation behavior relates to the problem of memory fragmentation.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An inference server has 100MB of total free memory for its KV cache, but this memory is fragmented into ten separate, non-contiguous 10MB chunks. A new request arrives that requires a 50MB block of memory for its KV cache. How would a system using a standard attention mechanism and a system using PagedAttention likely respond to this request?
Memory Allocation Failure Analysis
Memory Management System Analysis