Learn Before
Joint Probability of a Generated Sequence using the Chain Rule
The joint probability of generating a specific sequence of tokens, such as ⟨s⟩ a b c d, is calculated by applying the chain rule. This method decomposes the joint probability into a product of conditional probabilities, where each token's probability is conditioned on all the tokens that precede it. The expanded formula for this example is:
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Base Case for Sequence Probability
Joint Probability of a Generated Sequence using the Chain Rule
Relationship Between Joint, Conditional, and Marginal Log-Probabilities of Sequences
Derivation of Sequence Log-Probability via Chain Rule
Logarithmic Form of the Chain Rule for Sequence Probability
Formula for an Impossible Initial Event
A language model is tasked with calculating the total probability of the three-token sequence 'the cat sat'. The model provides the following probability estimates:
- The probability of the first token is
Pr("the") = 0.1 - The probability of the second token, given the first, is
Pr("cat" | "the") = 0.5 - The probability of the third token, given the first two, is
Pr("sat" | "the", "cat") = 0.8
Using the principle that the joint probability of a sequence is the product of the conditional probabilities of its components, what is the joint probability
Pr("the", "cat", "sat")?- The probability of the first token is
Computational Stability of Sequence Probability
Which of the following expressions correctly decomposes the joint probability of a four-token sequence
(x₁, x₂, x₃, x₄)using the chain rule of probability?
Learn After
A language model is generating a sequence of words. Given the following conditional probabilities, what is the joint probability of the model generating the exact sequence 'The cat sat'?
- The probability of starting with 'The' is 0.4.
- The probability of 'cat' following 'The' is 0.5.
- The probability of 'sat' following 'The cat' is 0.9.
Applying the Chain Rule to a Sequence
Comparing Language Model Sequence Probabilities