A language model is tasked with generating a four-token sequence, originally ordered as (x_0, x_1, x_2, x_3). Instead of a standard left-to-right approach, the model generates the tokens in the following arbitrary order: x_2 → x_0 → x_3 → x_1. Given this generation order, which expression correctly represents the conditional probability for predicting the final token, x_1? (Note: e_i represents the embedding of token x_i)
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Comparison of Arbitrary Order Prediction and Masked Language Modeling
Visual Representation of Permuted Language Modeling
A language model is tasked with generating a four-token sequence, originally ordered as
(x_0, x_1, x_2, x_3). Instead of a standard left-to-right approach, the model generates the tokens in the following arbitrary order:x_2 → x_0 → x_3 → x_1. Given this generation order, which expression correctly represents the conditional probability for predicting the final token,x_1? (Note:e_irepresents the embedding of tokenx_i)Contextual Advantages of Non-Sequential Token Generation
A language model is generating a five-token sequence, originally ordered as
(x_0, x_1, x_2, x_3, x_4). The model generates the tokens in the following arbitrary order:x_3 → x_1 → x_4 → x_0 → x_2. Arrange the conditional probability terms below to correctly represent the joint probability factorization for this specific generation order. (Note:e_irepresents the embedding of tokenx_i.)