Example

Joint Probability of a Generated Sequence using the Chain Rule

The joint probability of generating a specific sequence of tokens, such as ⟨s⟩ a b c d, is calculated by applying the chain rule. This method decomposes the joint probability into a product of conditional probabilities, where each token's probability is conditioned on all the tokens that precede it. The expanded formula for this example is: Pr(s)Pr(as)Pr(bsa)Pr(csab)Pr(dsabc)Pr(⟨s⟩) \cdot Pr(a|⟨s⟩) \cdot Pr(b|⟨s⟩ a) \cdot Pr(c|⟨s⟩ a b) \cdot Pr(d|⟨s⟩ a b c)

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences