Learn Before
Calculating Sequence Probability with a Start Token
A language model is calculating the probability for the three-token sequence (x_0, x_1, x_2). The first token, x_0, is a fixed, deterministic start-of-sequence symbol. The model provides the following conditional probabilities: Pr(x_1 | x_0) = 0.2 and Pr(x_2 | x_0, x_1) = 0.5. What is the joint probability of the entire sequence, Pr(x_0, x_1, x_2)? Explain your reasoning.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Log-Likelihood Objective for Language Model Training
A language model calculates the joint probability of a sequence of tokens
(x_0, x_1, ..., x_m). The first token,x_0, is a special, deterministic start-of-sequence symbol. How does the nature of this specific first token typically affect the overall calculation of the sequence's joint probability?Calculating Sequence Probability with a Start Token
Analyzing a Language Model's Sequence Probability