Learn Before
Formula for Prefix Cache State Generation
During the prefilling phase for an input sequence , we generate a sequence of prefixes and their corresponding Key-Value (KV) cache states. This mapping is defined as:
where denotes the KV cache state for the prefix . All these mappings can be stored in the prefix cache for efficient reuse.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A system is generating a series of stored Key-Value (KV) cache states for the input sequence of tokens
[A, B, C, D]. One stored state,cache_BC, corresponds to the prefix[A, B]. Another stored state,cache_BCD, corresponds to the prefix[A, B, C]. What is the relationship betweencache_BCandcache_BCD?A system is designed to generate and store a complete set of Key-Value (KV) cache states for all possible prefixes of the input token sequence
['The', 'cat', 'sat']. Arrange the following events in the correct chronological order in which they would occur during this process.Formula for Prefix Cache State Generation
Applying the Prefix Cache Generation Process
Learn After
An auto-regressive model is processing the input sequence of tokens:
['The', 'cat', 'sat']. When the model uses the prefix['The', 'cat']to generate the next token,'sat', what is the content of the corresponding Key-Value (KV) cache state that is created at this step?An auto-regressive model is generating a series of Key-Value (KV) cache states for the input sequence of tokens:
['The', 'quick', 'brown']. Arrange the following events in the correct chronological order in which they occur during this process.Prefix Cache Reuse Scenario