1Cademy - Calculating a Scaled Attention Score

Learn Before

Introduce weight matrices in the transformer

Short Answer

Calculating a Scaled Attention Score

In a transformer's self-attention layer, a query vector q = [2, 0, 1, 3] is being compared to a key vector k = [1, 2, 2, 1]. The dimensionality of these vectors (d_k) is 4. Based on the standard scaled dot-product attention mechanism, what is the resulting unnormalized attention score? Provide the final numerical answer.

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related