Initial Representation for Concatenated [x, y] Sequences
For a concatenated sequence [x, y], the initial input representation for the Transformer stack is generated on a per-token basis. For each position i' in the sequence, the token is converted into an embedding vector. This vector becomes the initial representation for that specific position and is then fed into the first Transformer layer.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Layer-wise Processing in Transformer Inference
Initial Representation for Concatenated [x, y] Sequences
Calculating an Initial Input Vector
A decoder-only model is preparing the input sequence 'The quick brown fox' for processing. To create the initial input representation for the token 'brown' (at position 2), the model retrieves its token embedding vector,
V_brown, and the positional embedding vector for position 2,P_2. Which of the following correctly describes the operation used to combine these two vectors into the final representation that is fed into the first layer of the model?A decoder-only Transformer model is given a sequence of tokens as input. Arrange the following steps in the correct chronological order to describe how the model creates the initial representation that is fed into its first layer.
Input Representation for a Single Token in Autoregressive Generation
Initial Representation for Concatenated [x, y] Sequences
A data scientist is using a large language model to determine the conditional log-probability of a specific completion
yfollowing a given promptx. Their process involves concatenating the two sequences into[x, y]and then performing a single forward pass to compute the log-probability of this combined sequence, which they take as their final result. Which statement best analyzes the flaw in this methodology?You are tasked with using a large language model to compute the conditional log-probability of an output sequence
ygiven an input sequencex. Arrange the following computational steps into the correct chronological order.Calculating Conditional Log-Probability from Model Outputs
Learn After
A language model is given two token sequences: sequence
xwith 10 tokens (at positions 0 through 9) and sequenceywith 5 tokens. To process them together, they are concatenated into a single sequence[x, y]. How is the initial input vector for the very first token of the original sequenceycalculated before being passed to the first processing layer?Positional Context in Concatenated Sequences
Evaluating a Representation Generation Method