Formula

Decoding Phase Goal Formula

In large language models, the objective of the decoding phase is to find the best predicted sequence of tokens. Instead of conditioning the prediction directly on the original input sequence, the generation process relies entirely on the contextual representation built during the preceding prefilling stage. The optimal predicted sequence, denoted as y^\hat{\mathbf{y}}, is determined by maximizing the conditional probability over this context: y^=arg maxyPr(ycache)\hat{\mathbf{y}} = \argmax_{\mathbf{y}} \Pr(\mathbf{y}|\mathrm{cache}) where cache\mathrm{cache} refers to the accumulated Key-Value (KV) cache.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related