Learn Before
Attention Decoder
The most upgrades are here. The RNN cell is left untouched but we use the encoder hidden states to generate words based on decoder hidden states. So the word we generate is a function of all encoder hidden states and current decoder state. But for inputs of different length, we are going to have different amounts of encoder hidden units. How do we fix the input dimension for that output function. One way is to add weight to each encoder hidden state based on its connection to the current decoder state and add them up. So for each hidden state we will define a score function:
Score( encoder hidden_state, decoder_hidden_state)
Encoder States: with Decoder current state:
Scores:
Softmax:
Context vector
Over here you can see how the scoring function will work for each of the decoder states. As you can see here we score each encoder state and then apply a softmax function on it in order to normalize the scores and then we add them up. Then this context vector will be used further along with the decoder state to determine current word
0
1
Tags
Data Science