1Cademy - In a self-attention mechanism, the raw attention scores (β) for a single query vector with respect to three key vectors are calculated as [2.0, 1.0, 0.5]. To convert these scores into a probability distribution, a normalization function is applied. What is the resulting normalized attention weight (α) corresponding to the first key vector (score of 2.0)?

Learn Before

Attention Weight Formula ( $\alpha_{i,j}$ )

Multiple Choice

In a self-attention mechanism, the raw attention scores (β) for a single query vector with respect to three key vectors are calculated as [2.0, 1.0, 0.5]. To convert these scores into a probability distribution, a normalization function is applied. What is the resulting normalized attention weight (α) corresponding to the first key vector (score of 2.0)?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related