1Cademy - Calculating and Interpreting Attention Weights

Learn Before

Attention Weight Formula ( $\alpha_{i,j}$ )

Short Answer

Calculating and Interpreting Attention Weights

In a simplified self-attention mechanism, the unnormalized attention scores (β) for a query token with respect to three key tokens in a sequence are [4.0, 2.0, 1.0]. Calculate the final normalized attention weight (α) that the query token places on the second key token (the one with a score of 2.0). Briefly explain the primary purpose of applying this normalization step to the raw scores. Show your calculation.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related