1Cademy - Self-Attention Formula for the Prefilling Phase

Learn Before

Parallel Self-Attention in the Prefilling Phase

Formula

Self-Attention Formula for the Prefilling Phase

During the prefilling phase, self-attention is computed for the entire input sequence in a single operation. The query, key, and value vectors are represented as matrices $\mathbf{Q}, \mathbf{K}, \mathbf{V} \in \mathbb{R}^{d \times (m+1)}$ . The attention output is calculated using the scaled dot-product formula: $\text{Att}_{\text{qkv}}(\mathbf{Q}, \mathbf{K}, \mathbf{V}) = \text{Softmax}\left(\frac{\mathbf{QK}^{\text{T}}}{\sqrt{d}} + \text{Mask}\right)\mathbf{V}$ Here, the causal mask, $\text{Mask} \in \mathbb{R}^{(m+1) \times (m+1)}$ , prevents tokens from attending to future positions by setting the corresponding entries in the attention score matrix to a large negative number (e.g., $-\infty$ ) before the Softmax function is applied.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

References

Learn Before

Related

Learn After