1Cademy - Analysis of Computational Costs in Transformer Inference

Learn Before

Computational Cost Comparison: Decoding vs. Prefilling

Short Answer

Analysis of Computational Costs in Transformer Inference

Explain why the decoding phase in a Transformer model's inference process is typically more computationally expensive than the prefilling phase. Go beyond simply stating that it's a sequential process and identify at least two distinct contributing factors.

Updated 2025-10-05

Contributors are: