Short Answer

Calculating and Interpreting Attention Weights

In a simplified self-attention mechanism, the unnormalized attention scores (β) for a query token with respect to three key tokens in a sequence are [4.0, 2.0, 1.0]. Calculate the final normalized attention weight (α) that the query token places on the second key token (the one with a score of 2.0). Briefly explain the primary purpose of applying this normalization step to the raw scores. Show your calculation.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science