Learn Before
Concept

General Attention function

This is similar to dot attention but also adds a learning aspect to it because we first pass the encoder vector through a Dense layer before taking the dot product. In this case attention is also is subject to backpropagation and gradient descent

score(h,ht)=(ht)TWahscore(h, h'_{t}) = (h'_{t})^{T} * W_{a} * h

hh - encoder state hth'_{t} - decoder state

0

1

Updated 2020-10-10

Tags

Data Science