Causation

Quadratic Complexity's Impact on Transformer Inference Speed

The quadratic time complexity inherent in the self-attention mechanism causes Transformer inference to become progressively slower as sequence length increases. This performance issue is particularly pronounced for long sequences, making the standard architecture inefficient for such tasks and motivating the development of faster, more efficient models.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.2 Generative Models - Foundations of Large Language Models

Related