Example

Diagram of the N-th Step in Transformer Decoding

This diagram illustrates the n-th step of the decoding process in a Transformer. At this stage, the model has already processed the initial prompt (x) and generated n-1 tokens (y1 to yn-1). The inputs are passed through an embedding layer. The resulting embeddings are then processed by a self-attention layer. This layer uses queries derived from the current step's input and attends to a set of keys and values that includes those from the initial prompt and all previously generated tokens. The process culminates in a Softmax layer that calculates the conditional probability for the next token, Pr(yn|x, y<n), continuing the step-by-step generation.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences