Example

Visual Representation of Permuted Language Modeling

This diagram provides a visual representation of Permuted Language Modeling, where a sequence is generated in a non-sequential, permuted order. In this example, the prediction order is x0x4x2x1x3x_0 \rightarrow x_4 \rightarrow x_2 \rightarrow x_1 \rightarrow x_3. Each row illustrates a step in the generation process, where the blue squares indicate the tokens that have already been generated and are used as context for predicting the next token. The step-by-step conditional probabilities are shown on the right:

  1. Step 1 (Predict x0x_0): The process starts with x0x_0, which is treated as a given starting point. Its probability is set to 1: Pr(x0)=1\text{Pr}(x_0) = 1.
  2. Step 2 (Predict x4x_4): The model predicts x4x_4 conditioned on the embedding of x0x_0: Pr(x4e0)\text{Pr}(x_4|\mathbf{e}_0).
  3. Step 3 (Predict x2x_2): The model predicts x2x_2 conditioned on the embeddings of the already generated tokens, x0x_0 and x4x_4: Pr(x2e0,e4)\text{Pr}(x_2|\mathbf{e}_0, \mathbf{e}_4).
  4. Step 4 (Predict x1x_1): The model predicts x1x_1 using the context of x0,x4,x_0, x_4, and x2x_2: Pr(x1e0,e4,e2)\text{Pr}(x_1|\mathbf{e}_0, \mathbf{e}_4, \mathbf{e}_2).
  5. Step 5 (Predict x3x_3): Finally, the model predicts x3x_3 conditioned on all other tokens: Pr(x3e0,e4,e2,e1)\text{Pr}(x_3|\mathbf{e}_0, \mathbf{e}_4, \mathbf{e}_2, \mathbf{e}_1).
Image 0

0

1

Updated 2026-04-15

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Learn After