Activity (Process)

Processing Flow of Autoregressive Generation in a Decoder-Only Transformer

The process of autoregressive generation in a decoder-only Transformer involves a step-by-step architectural flow. At each generation step i, the input sequence is formed by concatenating the initial prompt x with all previously generated tokens y_{<i}. This combined sequence is first converted into embeddings by an embedding layer. The embeddings are then processed through a stack of L decoder layers, each containing self-attention and feed-forward network (FFN) modules. The output from the final decoder layer undergoes a linear mapping and is then passed to a Softmax layer. This produces the conditional probability distribution, Pr(·|x, y_{<i}), which is used to select the next token y_i.

0

1

Updated 2026-04-19

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.5 Inference - Foundations of Large Language Models