Learn Before
Additive Attention Scoring Function
Additive attention is a scoring function designed for situations where queries and keys reside in vector spaces of differing dimensionality, making a direct dot product infeasible. Introduced by Bahdanau et al. (2014), it projects both the query and the key into a shared hidden space of dimension using separate learned weight matrices and . These projections are summed element-wise and passed through a nonlinearity, after which a learned weight vector reduces the result to a scalar attention score:
This score is subsequently fed into a softmax function to produce nonnegative, normalized attention weights. An equivalent interpretation views additive attention as concatenating the query and key and feeding them through an MLP with a single hidden layer using activation. As its name suggests, the computation is additive rather than multiplicative, which can yield minor computational savings compared to approaches that require matrix products between query and key vectors.
0
1
Contributors are:
Who are from:
Tags
Data Science
D2L
Dive into Deep Learning @ D2L