Dot Product Attention
Dot product attention is a fundamental type of multiplicative attention based on dot similarity. Geometrically, the dot product measures the alignment between vectors: if a query and key share a similar direction, their dot product is higher, whereas orthogonal vectors yield a dot product of . This implies that keys which are more conceptually related to the current query will receive larger attention scores. One notable characteristic of pure dot product attention is that it does not introduce any additional learnable parameters, relying entirely on the existing vector representations. The attention score is calculated mathematically using the standard query-key notation as: where represents the query vector and represents the key vector.
0
1
Contributors are:
Who are from:
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
D2L
Dive into Deep Learning @ D2L
Learn After
Example of Predicting Masked Words: Kitten Playing
Example of Masked Language Modeling: Kitten Chasing Ball
Example of Context-Based Prediction: Kitten Chasing Ball
In a sequence-to-sequence model, an attention mechanism calculates a score for three input vectors (A, B, and C) relative to a single output vector (D). The scoring function is the simple dot product between the output vector and each input vector. You are given the following geometric relationships:
- Vector A points in a very similar direction to Vector D.
- Vector B is orthogonal (at a 90-degree angle) to Vector D.
- Vector C points in the opposite direction of Vector D.
Which input vector will receive the highest attention score, and what is the underlying reason for this?
Evaluating Attention Mechanisms in Machine Translation
Calculating a Dot Attention Score