Learn Before
Concept

Dot Attention

This is the first type of multiplicative attention. This is the easiest option to implement and it is based on dot similarity. Geometrically, the dot product depends on the angle in between the vectors. The intuition behind it is that if vectors have similar direction then dot product will be higher and further the direction goes smaller the dot similarly will become. For example dot product of orthogonal vectors will be zero. And in this case the encoder vectors that are more related to the decoder vector will get bigger scores. One disadvantage for this is that it does not use any learning while other types utilize some kind of learning:

score(h,ht)=(ht)Th score(h, h'_{t}) = (h'_{t})^{T} * h.

hh - encoder hth'_{t} - decoder

0

1

Updated 2026-01-15

Tags

Data Science

Foundations of Large Language Models Course

Computing Sciences

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models