1Cademy - Performance Bottleneck in a Generative Model

Learn Before

Time Complexity of Self-Attention in Autoregressive Generation

Case Study

Performance Bottleneck in a Generative Model

A developer is profiling a text-generation model that uses a multi-layer Transformer architecture. They observe the following performance characteristics during autoregressive generation, where the time measured is dominated by the self-attention computations. Based on this data, identify the mathematical relationship between the generated sequence length and the computation time, and explain why this relationship occurs.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related