1Cademy - Contextual Token Representation in Sub-layers

Learn Before

Transformer Block Sub-Layers

Concept

Contextual Token Representation in Sub-layers

In a Transformer architecture, both the input and output of a sub-layer are structured as an $m \times d$ matrix, where $m$ denotes the sequence length and $d$ represents the dimensionality. Within these matrices, the $i$ -th row serves as a contextual representation for the $i$ -th token in the sequence, encoding its meaning relative to the surrounding tokens.

Updated 2026-04-18

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related