Learn Before
Attention Motivation
Attention is one of the most important innovations in deep learning for the last few years. The papers that introduced this mechanism also can consider the example of machine translation. Let’s quickly review the encoder-decoder architecture. We have an encoder and decoder parts. Encoder part runs an RNN through the input, and returns the final one context vector which we lately use during the decoding phase where we feed it to another RNN as initial hidden inputs. One big problem with that is when sentences get long the performance drops considerably, even though LSTM are supposed to keep the long term information. To fight with long sentences, researchers came up with the technique called attention. The attention mechanism tries to mimic our thinking because we are first focusing on different elements in the sentence or in the image before describing what is in there. In this case instead of only one vector benign passed to the decoder, we pass all the hidden layer vectors from each time stamp.
0
1
Tags
Data Science
Related
Neural Machine Translation by Jointly Learning to Align and Translate
Effective Approaches to Attention-based Neural Machine Translation
Attention Motivation
Example of how Attention is used in Machine Translation
The Illustrated Transformer
Attention Is All You Need
Attention is all you need; Attentional Neural Network Models | Łukasz Kaiser | Masterclass
Tensor2Tensor Intro
Transformer model
Transformer
Efficient Transformers: A Survey
Evaluation of Efficient Transformers