1Cademy - Attention Weight Formula ($$\alpha

Learn Before

Calculating Attention Weights (αi,j) in Transformers

Formula

Attention Weight Formula ( $\alpha_{i,j}$ )

The attention weight, denoted as $\alpha_{i,j}$ , is obtained by applying the Softmax function to the pre-softmax attention score $\beta_{i,j}$ . This normalization step converts the raw scores into a probability distribution, ensuring that the weights for a given position $i$ sum to one across all positions $j'$ . The formula is expressed as: