Concept

Factors Contributing to High Decoding Cost

The higher computational expense of the decoding phase compared to prefilling is not solely attributable to its sequential, one-by-one token generation and the repeated updates to the KV cache. While these factors contribute, the full explanation for its significant cost involves more complex underlying reasons.

0

1

Updated 2026-01-15

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences