1Cademy - Self-Attention layer understanding - Step 5

Learn Before

Concept

Self-Attention layer understanding - Step 5 - Adding the time

So as you may have noticed in the current state the order of the words do not matter at all. We can permute the sentence but the result would be the same. In this case instead of using RNN to account for the order we can calculate positional encoding for the each timestamp and just add it to the word embeddings(note that we do it once right after the embedding layer). That positional encoding is calculated so that projected vectors into Q/K/V vectors have some meaning full distance in between them. Here is the example of how to it is calculated for the 20 words (rows) with an embedding size of 512 (columns)

Updated 2025-10-06

Contributors are:

Who are from:

University of California, Berkeley

🏆 2

Google

✔️ 1

References

Learn Before

Related

Learn After