logo
How it worksCoursesResearch CommunitiesBenefitsAbout Us
Schedule Demo
Learn Before
  • Attention-level improvements of Transformers

Concept icon
Concept

Low Rank Self-Attention

The self-attention A ∈RTxT\in \mathbb{R}^{TxT}∈RTxT has been observed to be low ranking, meaning that the rank of A is far lower than input length 𝑇. This implies that the low-rank property could be explicitly modeled with parameterization. Low rank self-attention is when the self-attention matrix is replaced by a low-rank approximation.

0

1

Concept icon
Updated 2022-05-20

Contributors are:

Adam Nik
Adam Nik
🏆 1

Who are from:

Carleton College
Carleton College
🏆 1

References


  • A Survey of Transformers (Lin et. al, 2021)

Tags

Data Science

Related
  • Sparse Attention

    Concept icon
  • Query Prototyping and Memory Compression

    Concept icon
  • Low Rank Self-Attention

    Concept icon
  • Attention with Prior

    Concept icon
  • Improved Multi-Head Attention Mechanism

    Concept icon
  • Linear Attention

    Concept icon
  • A research team is working to reduce the computational cost of the attention mechanism for processing extremely long documents. Their proposed solution involves modifying the attention calculation so that each query token only computes attention scores with a small, fixed subset of key tokens (e.g., neighboring tokens and a few globally important tokens) instead of all tokens in the sequence. Which category of attention improvement best describes this approach?

  • Match each attention improvement strategy with its core operational principle.

  • Optimizing Transformer Attention for Long Sequences

  • Evaluating Attention Optimization Strategies for Specific Applications

logo 1cademy1Cademy

Optimize Scalable Learning and Teaching

How it worksCoursesResearch CommunitiesBenefitsAbout Us
TermsPrivacyCookieGDPR

Contact Us

iman@honor.education

Follow Us




© 1Cademy 2026

We're committed to OpenSource on

Github