Learn Before
Concept

Transformer Encoder part:

Here on the image you can see what each Transformer encoder part itself consists from three parts:

Self - Attention layer Feed Forward layer Add normalize

Regarding the feed-forward layer, we already know how this works. The main mystery here is the self - Attention layer. In order to understand self -attention layer better I will divide self-layer understanding into several parts where I will modify the usual seq2seq model encoder:

Image 0

0

1

Updated 2020-10-24

Tags

Data Science