1Cademy - Transformer Decoder

Learn Before

Transformer model

Concept

Transformer Decoder

Here on the image you can see the structure of the decoder which is very similar to the encoder part we just described with only difference that in this case we pass K, V from the input to the each decoder attention layer(encoder -decoder layer) and now we take queries from previous decoder layers and compare the decoder query with encoder keys just like in the usual se2seq model. Also the difference is that we have here a layer of so called masked self-attention. It is just the layer at each time stamp we do not compare the query with future keys