Formula

Attention Weight Formula (αi,j\alpha_{i,j})

The attention weight, denoted as αi,j\alpha_{i,j}, is obtained by applying the Softmax function to the pre-softmax attention score βi,j\beta_{i,j}. This normalization step converts the raw scores into a probability distribution, ensuring that the weights for a given position ii sum to one across all positions jj'. The formula is expressed as:

αi,j=Softmax(βi,j)=exp(βi,j)jexp(βi,j)\alpha_{i,j} = \mathrm{Softmax}(\beta_{i,j}) = \frac{\exp(\beta_{i,j})}{\sum_{j'} \exp(\beta_{i,j'})}

Image 0

0

1

Updated 2026-04-22

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences