Learn Before
Explaining Zero Probability Sequences
A language model is tasked with calculating the probability of a specific sequence of words. The model's internal data indicates that the very first word in this sequence is impossible, meaning it has a probability of 0. Without knowing the probabilities of any of the other words in the sequence, explain the definitive conclusion you can draw about the joint probability of the entire sequence and the mathematical reasoning behind it.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model calculates the joint probability of a sequence of events (e.g., words) by multiplying the probability of the first event by the conditional probabilities of each subsequent event. Given the following probabilities for a three-event sequence (x₀, x₁, x₂), what is the joint probability of the entire sequence?
- Probability of the first event, Pr(x₀) = 0.0
- Probability of the second event given the first, Pr(x₁|x₀) = 0.4
- Probability of the third event given the first two, Pr(x₂|x₀, x₁) = 0.8
Language Model Debugging Scenario
A language model is generating a sequence of words. The first word has a probability of 0. However, the conditional probabilities for all subsequent words in the sequence are very high (e.g., 0.99 for each). In this scenario, the high probabilities of the later words can overcome the initial zero probability, resulting in a non-zero joint probability for the entire sequence.
Explaining Zero Probability Sequences