1Cademy - Low Rank Self-Attention

Learn Before

Attention-level improvements of Transformers

Concept

Low Rank Self-Attention

The self-attention A $\in \mathbb{R}^{TxT}$ has been observed to be low ranking, meaning that the rank of A is far lower than input length 𝑇. This implies that the low-rank property could be explicitly modeled with parameterization. Low rank self-attention is when the self-attention matrix is replaced by a low-rank approximation.

Updated 2022-05-20

Contributors are:

Who are from:

References

A Survey of Transformers (Lin et. al, 2021)

Learn Before

Related