1Cademy - Input Representation for a Single Token in Autoregressive Generation

Learn Before

Initial Input Representation for Transformer Layers

Concept

Input Representation for a Single Token in Autoregressive Generation

During autoregressive generation, for the token at the current position i', its embedding is computed. This embedding, which typically combines the token's semantic meaning and its positional information, serves as the initial input representation that is fed into the stack of Transformer layers for processing.