Neural Network Probability Factorization
An auto-regressive neural network is processing the token sequence (the, cat, sat). Using the notation e_token to represent the embedding for a given token, write out the full factorization of the joint probability Pr(the, cat, sat) as it would be computed by the model. Do not include a start-of-sequence token.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Probability Factorization for Arbitrary Order Token Prediction
Causal Language Modeling
An auto-regressive neural network is calculating the joint probability of the token sequence
(x_0, x_1, x_2, x_3). To do this, it must compute the conditional probability for the final token, expressed asPr(x_3 | x_0, x_1, x_2). Which statement best analyzes how the neural network practically implements this probabilistic conditioning?Neural Network Probability Factorization
An auto-regressive neural network is tasked with calculating the total probability of the three-token sequence
(x_0, x_1, x_2). Arrange the following computational steps in the correct chronological order that the model would follow, wheree_irepresents the embedding for tokenx_i.