A text generation system is designed to accelerate inference by storing the pre-computed internal states of common input prefixes in a key-value datastore. When a new request is received, the system attempts to leverage this datastore. Arrange the following actions into the correct chronological sequence that the system follows to process the new request.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineering team is building a system to accelerate text generation by storing and reusing the pre-computed internal states for common initial phrases (prefixes). They are using a key-value datastore where each key must uniquely identify a prefix and map to its corresponding stored state. To ensure the fastest possible retrieval of these states, which of the following strategies for creating the 'key' from a prefix is the most effective?
A text generation system is designed to accelerate inference by storing the pre-computed internal states of common input prefixes in a key-value datastore. When a new request is received, the system attempts to leverage this datastore. Arrange the following actions into the correct chronological sequence that the system follows to process the new request.
Diagnosing Prefix Cache Inefficiency