1Cademy - Initial Input Representation for Transformer Layers

Learn Before

Decoder-Only Transformer as a Language Model
Positional Encoding

Concept

Initial Input Representation for Transformer Layers

In a decoder-only Transformer model, the sequence of input tokens is represented by a sequence of $d_e$ -dimensional vectors, denoted as $\{\mathbf{e}_0, ..., \mathbf{e}_{m-1}\}$ . For a given position $i$ , the vector $\mathbf{e}_i$ is computed as the sum of the token embedding for the specific token $x_i$ and its corresponding positional embedding. This final sequence of vectors forms the initial input that is fed into the stack of Transformer blocks.