Learn Before
Log Probabilities
In computational models (such as n-gram language models), multiplying many small probabilities can cause arithmetic underflow, where the product becomes too small to be represented by standard floating-point numbers and rounds to zero. To prevent this, computations are performed in log space. Transforming the multiplication of probabilities into the addition of their logarithms maintains numerical stability:
Values only need to be converted back to raw probabilities (using the exponential function) at the very end of the process, if necessary.
0
1
Contributors are:
Who are from:
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Learn After
A language model is designed to calculate the probability of a long sentence by sequentially multiplying the conditional probabilities of each word. Each individual word probability is a small floating-point number (e.g., 0.05, 0.1, 0.02). During testing on sentences with over 100 words, the model consistently outputs a final probability of 0.0, even though no single word has a probability of zero. What is the most likely technical reason for this incorrect result?
Comparing Sequence Probabilities in Log Space
Evaluating Sequence Likelihood with Log Probabilities
Logarithmic Form of the Chain Rule for Sequence Probability
Derivation of Sequence Log-Probability via Chain Rule
Sequence Evaluation using Log-Probability