1Cademy - Self-Attention Output Formula

Learn Before

General Attention Formula

Formula

Self-Attention Output Formula

Given a sequence of input tokens $\mathbf{x}_1, \ldots, \mathbf{x}_n$ where each token $\mathbf{x}_i \in \mathbb{R}^d$ for $1 \leq i \leq n$ , the self-attention mechanism produces an output sequence of the same length, denoted as $\mathbf{y}_1, \ldots, \mathbf{y}_n$ . Each output vector $\mathbf{y}_i$ is computed by treating the token $\mathbf{x}_i$ as the query, and the entire sequence of tokens as both the keys and the values. This is mathematically defined as: