Formula

Formula for Causal Attention

The output of a causal attention mechanism for a specific query vector qi\mathbf{q}_i is calculated as a weighted sum of value vectors vj\mathbf{v}_j from all positions jj up to and including the current position ii. The formula is expressed as: Attqkv(qi,Ki,Vi)=j=0iα(i,j)vj\text{Att}_{\text{qkv}}(\mathbf{q}_i, \mathbf{K}_{\leq i}, \mathbf{V}_{\leq i}) = \sum_{j=0}^{i} \alpha(i, j) \mathbf{v}_j Here, α(i,j)\alpha(i, j) represents the attention weight assigned to the value vector at position jj when computing the output for position ii.

Image 0

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences