Problem

Computational Infeasibility of Standard Transformers for Long Sequences

The standard Transformer architecture is fundamentally ill-suited for processing very long sequences due to its high computational demands. The core issue is the self-attention mechanism, whose computational cost grows quadratically with sequence length. This quadratic scaling makes it practically infeasible to both train and deploy models on extremely long inputs.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.2 Generative Models - Foundations of Large Language Models

Related