Concept

Decoding Network for KV Cache Generation

The decoding network responsible for generating the Key-Value (KV) cache, denoted as Deckv(â‹…)\mathrm{Dec}_{\mathrm{kv}}(\cdot), shares the identical underlying architecture with the standard decoding network used for token prediction. The primary distinction lies in its output: instead of returning the standard output representations for tokens, this specialized network explicitly returns the multi-layered KV cache produced within the self-attention mechanisms during processing.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related