Calculating Sequence Log-Probability
A language model assigns the following conditional log-probabilities to the three-token sequence ('the', 'cat', 'sat'):
log Pr('the') = -1.5log Pr('cat' | 'the') = -2.0log Pr('sat' | 'the', 'cat') = -1.2
Based on the principle of decomposing a sequence's probability, calculate the log-probability of the entire sequence, log Pr('the', 'cat', 'sat').
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Log-Likelihood of a Sequence
When calculating the probability of a long sequence of words, the standard approach involves multiplying many conditional probabilities, each of which is a value between 0 and 1. This product is often converted into a sum by applying the logarithm to each term. What is the primary computational reason for this transformation?
A language model calculates the probability of a sequence of tokens, , using the product of conditional probabilities: . To improve numerical stability and simplify calculations, this product is converted into a sum by taking the logarithm. Which of the following expressions correctly represents the log-probability, ?
Calculating Sequence Log-Probability