1Cademy - A model is processing a sequence of three tokens. For the query at position 2, the un-normalized attention scores with respect to the keys at positions 0, 1, and 2 are calculated as [1.0, 2.0, 3.0] respectively. What is the final attention weight that the token at position 2 will assign to the token at position 1?

Learn Before

Calculating Attention Weights (αi,j) in Transformers

Multiple Choice

A model is processing a sequence of three tokens. For the query at position 2, the un-normalized attention scores with respect to the keys at positions 0, 1, and 2 are calculated as [1.0, 2.0, 3.0] respectively. What is the final attention weight that the token at position 2 will assign to the token at position 1?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related