Multiple Choice

An inference server has 100MB of total free memory for its KV cache, but this memory is fragmented into ten separate, non-contiguous 10MB chunks. A new request arrives that requires a 50MB block of memory for its KV cache. How would a system using a standard attention mechanism and a system using PagedAttention likely respond to this request?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science