Diagram of the Autoregressive Generation Architectural Flow
This diagram illustrates the architectural flow for a single step of autoregressive generation in a decoder-only Transformer. The process begins with the input sequence, formed by concatenating the prompt tokens (x0...xm) and any previously generated tokens (y1...yi-1). This sequence is fed into an embedding layer. The resulting embeddings are then processed through a stack of L decoder layers, each comprising self-attention and feed-forward network (FFN) modules. The output from the final layer is passed through a linear mapping and a Softmax layer to compute the conditional probability distribution, Pr(·|x, y_{<i}), for the next token.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Logits in Transformer Language Models
Final Hidden States in a Transformer Language Model
Next-Token Probability Calculation in Autoregressive Decoders
Diagram of the Decoding Phase
Diagram of the Transformer Language Model Forward Pass
Diagram of the Autoregressive Generation Architectural Flow
A decoder-only language model generates text one token at a time in a step-by-step process. Arrange the following steps in the correct chronological order for generating a single new token, given an initial prompt and any previously generated tokens.
In the step-by-step generation process of a decoder-only language model, consider a hypothetical modification at generation step
i. Instead of using the initial prompt combined with all previously generated tokens as input, the model is only given the initial prompt. What is the most likely consequence of this change on the generated text?Diagnosing a Generation Failure in a Decoder-Only Model
Learn After
A decoder-only language model generates text one token at a time. Arrange the following computational steps in the correct order for generating a single new token, given a prompt and any previously generated tokens.
In the architectural flow for generating a single new token, a decoder-only model processes the input sequence through multiple layers. After the final decoder layer produces its output vector, what is the immediate and primary purpose of applying a final linear mapping and a Softmax function?
In the architectural flow for generating a single new token, a decoder-only model performs several distinct operations. Match each architectural component with its primary function during this single-step process.