Learn Before
Formula for an Impossible Initial Event
The formula for an initial event or token in a sequence being an impossible event is:

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Base Case for Sequence Probability
Joint Probability of a Generated Sequence using the Chain Rule
Relationship Between Joint, Conditional, and Marginal Log-Probabilities of Sequences
Derivation of Sequence Log-Probability via Chain Rule
Logarithmic Form of the Chain Rule for Sequence Probability
Formula for an Impossible Initial Event
A language model is tasked with calculating the total probability of the three-token sequence 'the cat sat'. The model provides the following probability estimates:
- The probability of the first token is
Pr("the") = 0.1 - The probability of the second token, given the first, is
Pr("cat" | "the") = 0.5 - The probability of the third token, given the first two, is
Pr("sat" | "the", "cat") = 0.8
Using the principle that the joint probability of a sequence is the product of the conditional probabilities of its components, what is the joint probability
Pr("the", "cat", "sat")?- The probability of the first token is
Computational Stability of Sequence Probability
Which of the following expressions correctly decomposes the joint probability of a four-token sequence
(x₁, x₂, x₃, x₄)using the chain rule of probability?
Learn After
Implication of an Impossible Initial Event
A language model calculates the probability of a sequence of three tokens, {x₀, x₁, x₂}, using the formula: Pr(x₀, x₁, x₂) = Pr(x₀) * Pr(x₁|x₀) * Pr(x₂|x₀, x₁). If the model determines that the initial token, x₀, is an impossible event, what is the joint probability of the entire sequence?
Consequence of an Impossible Starting Token
A language model is calculating the probability of the sequence 'Zxq#w the cat sat'. If the model's vocabulary does not contain the token 'Zxq#w', making its initial probability zero, the model can still assign a non-zero probability to the entire sequence by considering the high probabilities of the subsequent words 'the', 'cat', and 'sat'.