Input Representation for a Single Token in Autoregressive Generation
During autoregressive generation, for the token at the current position i', its embedding is computed. This embedding, which typically combines the token's semantic meaning and its positional information, serves as the initial input representation that is fed into the stack of Transformer layers for processing.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Layer-wise Processing in Transformer Inference
Initial Representation for Concatenated [x, y] Sequences
Calculating an Initial Input Vector
A decoder-only model is preparing the input sequence 'The quick brown fox' for processing. To create the initial input representation for the token 'brown' (at position 2), the model retrieves its token embedding vector,
V_brown, and the positional embedding vector for position 2,P_2. Which of the following correctly describes the operation used to combine these two vectors into the final representation that is fed into the first layer of the model?A decoder-only Transformer model is given a sequence of tokens as input. Arrange the following steps in the correct chronological order to describe how the model creates the initial representation that is fed into its first layer.
Input Representation for a Single Token in Autoregressive Generation
Learn After
An autoregressive language model is generating text one token at a time. It has just produced the token 'blue' as the fourth token in the sequence 'The sky is blue'. To determine the fifth token, the model must first create an input representation for the token 'blue' at position 4. How is this initial representation for 'blue' typically constructed before it is fed into the model's processing layers?
Input Vector Creation in Autoregressive Generation
Input Vector Construction During Text Generation