Learn Before
An inference system for a large model has previously processed the input 'The best movie of all time is' and has stored the corresponding internal states in a cache. A new user then submits the input 'The best movie of the year is'. How will the system most efficiently use the cache to process this new request?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An inference system for a large model has previously processed the input 'The best movie of all time is' and has stored the corresponding internal states in a cache. A new user then submits the input 'The best movie of the year is'. How will the system most efficiently use the cache to process this new request?
Computational Efficiency of Prefix Cache Utilization
A new input sequence is provided to a language model that uses a prefix cache for inference. Arrange the following steps in the correct chronological order to describe how the system utilizes the cache to process this new sequence.