Example

Diagram of the Autoregressive Generation Architectural Flow

This diagram illustrates the architectural flow for a single step of autoregressive generation in a decoder-only Transformer. The process begins with the input sequence, formed by concatenating the prompt tokens (x0...xm) and any previously generated tokens (y1...yi-1). This sequence is fed into an embedding layer. The resulting embeddings are then processed through a stack of L decoder layers, each comprising self-attention and feed-forward network (FFN) modules. The output from the final layer is passed through a linear mapping and a Softmax layer to compute the conditional probability distribution, Pr(·|x, y_{<i}), for the next token.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences