1Cademy - A language model is tasked with generating a four-token sequence, originally ordered as `(x_0, x_1, x_2, x_3)`. Instead of a standard left-to-right approach, the model generates the tokens in the following arbitrary order: `x_2 → x_0 → x_3 → x_1`. Given this generation order, which expression correctly represents the conditional probability for predicting the final token, `x_1`? (Note: `e_i` represents the embedding of token `x

Learn Before

Probability Factorization for Arbitrary Order Token Prediction

Multiple Choice

A language model is tasked with generating a four-token sequence, originally ordered as (x_0, x_1, x_2, x_3). Instead of a standard left-to-right approach, the model generates the tokens in the following arbitrary order: x_2 → x_0 → x_3 → x_1. Given this generation order, which expression correctly represents the conditional probability for predicting the final token, x_1? (Note: e_i represents the embedding of token x_i)

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related