Concept

Low-Rank Self-Attention

The self-attention matrix ART×T\mathbf{A} \in \mathbb{R}^{T \times T} has been observed to have a low rank, meaning that the rank of A\mathbf{A} is far lower than the input sequence length TT. This implies that the low-rank property can be explicitly modeled with parameterization. Low-rank self-attention is an efficiency improvement where the standard self-attention matrix is replaced by a low-rank approximation.

0

1

Updated 2026-06-16

Tags

Data Science