1Cademy - Visual Representation of Permuted Language Modeling

Learn Before

Probability Factorization for Arbitrary Order Token Prediction

Example

Visual Representation of Permuted Language Modeling

This diagram provides a visual representation of Permuted Language Modeling, where a sequence is generated in a non-sequential, permuted order. In this example, the prediction order is $x_0 \rightarrow x_4 \rightarrow x_2 \rightarrow x_1 \rightarrow x_3$ . Each row illustrates a step in the generation process, where the blue squares indicate the tokens that have already been generated and are used as context for predicting the next token. The step-by-step conditional probabilities are shown on the right:

Step 1 (Predict $x_0$ ): The process starts with $x_0$ , which is treated as a given starting point. Its probability is set to 1: $\text{Pr}(x_0) = 1$ .
Step 2 (Predict $x_4$ ): The model predicts $x_4$ conditioned on the embedding of $x_0$ : $\text{Pr}(x_4|\mathbf{e}_0)$ .
Step 3 (Predict $x_2$ ): The model predicts $x_2$ conditioned on the embeddings of the already generated tokens, $x_0$ and $x_4$ : $\text{Pr}(x_2|\mathbf{e}_0, \mathbf{e}_4)$ .
Step 4 (Predict $x_1$ ): The model predicts $x_1$ using the context of $x_0, x_4,$ and $x_2$ : $\text{Pr}(x_1|\mathbf{e}_0, \mathbf{e}_4, \mathbf{e}_2)$ .
Step 5 (Predict $x_3$ ): Finally, the model predicts $x_3$ conditioned on all other tokens: $\text{Pr}(x_3|\mathbf{e}_0, \mathbf{e}_4, \mathbf{e}_2, \mathbf{e}_1)$ .

0

1

Updated 2026-04-15

Contributors are:

Who are from:

References

Learn Before

Related

Learn After