Learn Before
Diagram of the N-th Step in Transformer Decoding
This diagram illustrates the n-th step of the decoding process in a Transformer. At this stage, the model has already processed the initial prompt (x) and generated n-1 tokens (y1 to yn-1). The inputs are passed through an embedding layer. The resulting embeddings are then processed by a self-attention layer. This layer uses queries derived from the current step's input and attends to a set of keys and values that includes those from the initial prompt and all previously generated tokens. The process culminates in a Softmax layer that calculates the conditional probability for the next token, Pr(yn|x, y<n), continuing the step-by-step generation.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Decoding Phase as a Memory-Bound Process
Diagram of the N-th Step in Transformer Decoding
A large language model has processed an initial prompt and has just generated the fifth token of its output. As it prepares to generate the sixth token, which of the following statements most accurately describes the function of the self-attention mechanism in this specific step?
A large language model is generating a response one token at a time after processing the initial prompt. Arrange the following actions in the correct sequence to describe how a single new token is generated.
Q, K, and V Composition in Transformer Decoding
Analyzing a Flawed Decoding Step
Learn After
A Transformer-based language model is given the prompt 'The quick brown fox' and begins generating a continuation. It has already produced the tokens 'jumps', 'over'. The model is now at the step of generating the next token after 'over'. During the self-attention calculation at this specific step, which set of tokens provides the source for the keys and values that the current token's query will attend to?
A Transformer decoder is at the N-th step of generating an output sequence, having already processed an initial prompt and the first N-1 output tokens. Arrange the following key operations that occur during this specific N-th step in the correct chronological order.
Contextual Attention in Sentence Completion