Concept

Initial Input Representation for Transformer Layers

In a decoder-only Transformer model, the sequence of input tokens is represented by a sequence of ded_e-dimensional vectors, denoted as {e0,...,em1}\{\mathbf{e}_0, ..., \mathbf{e}_{m-1}\}. For a given position ii, the vector ei\mathbf{e}_i is computed as the sum of the token embedding for the specific token xix_i and its corresponding positional embedding. This final sequence of vectors forms the initial input that is fed into the stack of Transformer blocks.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.5 Inference - Foundations of Large Language Models

Related