When calculating the probability of a long sequence of words, the standard approach involves multiplying many conditional probabilities, each of which is a value between 0 and 1. This product is often converted into a sum by applying the logarithm to each term. What is the primary computational reason for this transformation?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Log-Likelihood of a Sequence
When calculating the probability of a long sequence of words, the standard approach involves multiplying many conditional probabilities, each of which is a value between 0 and 1. This product is often converted into a sum by applying the logarithm to each term. What is the primary computational reason for this transformation?
A language model calculates the probability of a sequence of tokens, , using the product of conditional probabilities: . To improve numerical stability and simplify calculations, this product is converted into a sum by taking the logarithm. Which of the following expressions correctly represents the log-probability, ?
Calculating Sequence Log-Probability