An auto-regressive neural network is calculating the joint probability of the token sequence (x_0, x_1, x_2, x_3). To do this, it must compute the conditional probability for the final token, expressed as Pr(x_3 | x_0, x_1, x_2). Which statement best analyzes how the neural network practically implements this probabilistic conditioning?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Probability Factorization for Arbitrary Order Token Prediction
Causal Language Modeling
An auto-regressive neural network is calculating the joint probability of the token sequence
(x_0, x_1, x_2, x_3). To do this, it must compute the conditional probability for the final token, expressed asPr(x_3 | x_0, x_1, x_2). Which statement best analyzes how the neural network practically implements this probabilistic conditioning?Neural Network Probability Factorization
An auto-regressive neural network is tasked with calculating the total probability of the three-token sequence
(x_0, x_1, x_2). Arrange the following computational steps in the correct chronological order that the model would follow, wheree_irepresents the embedding for tokenx_i.