Activity (Process)

Decoding Phase in Transformer Inference

Following the prefilling stage, the decoding phase utilizes the pre-computed key-value pairs stored in the KV cache to autoregressively generate subsequent tokens one by one.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Learn After