Learn Before
  • Transformer

Computational Cost of Self-Attention in Transformers

The self-attention mechanism, a core component of the Transformer architecture, exhibits a computational complexity that scales quadratically with the length of the input sequence. This characteristic makes it prohibitively expensive and impractical to train or deploy Transformer-based models on tasks involving very long texts.

0

1

16 days ago

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • Self-attention layers' first approach

  • Transformers in contextual generation and summarization

  • Huggingface Model Summary

  • A Survey of Transformers (Lin et. al, 2021)

  • Overview of a Transformer

  • Model Usage of Transformers

  • Attention in vanilla Transformers

  • Transformer Variants (X-formers)

  • The Pre-training and Fine-tuning Paradigm

  • Architectural Categories of Pre-trained Transformers

  • Transformer Blocks and Post-Norm Architecture

  • Model Depth (L) in Transformers

  • Transformers as Language Models

  • Computational Cost of Self-Attention in Transformers

Learn After
  • KV Cache during Transformer Inference

  • Architectural Adaptation of LLMs for Long Sequences

  • Cross-Layer Parameter Sharing in Transformers