A decoder-only Transformer model is given a sequence of tokens as input. Arrange the following steps in the correct chronological order to describe how the model creates the initial representation that is fed into its first layer.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.5 Inference - Foundations of Large Language Models
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Layer-wise Processing in Transformer Inference
Initial Representation for Concatenated [x, y] Sequences
Calculating an Initial Input Vector
A decoder-only model is preparing the input sequence 'The quick brown fox' for processing. To create the initial input representation for the token 'brown' (at position 2), the model retrieves its token embedding vector,
V_brown, and the positional embedding vector for position 2,P_2. Which of the following correctly describes the operation used to combine these two vectors into the final representation that is fed into the first layer of the model?A decoder-only Transformer model is given a sequence of tokens as input. Arrange the following steps in the correct chronological order to describe how the model creates the initial representation that is fed into its first layer.
Input Representation for a Single Token in Autoregressive Generation