Formula

Initial Token Probability Assumption

When modeling sequence probabilities, it is commonly assumed that for the initial token (when i=0i=0), the probability is deterministic, meaning Pr(xix0,...,xi1)=Pr(x0)=1\Pr(x_{i}|x_0,...,x_{i-1})=\Pr(x_0)={}1. As a consequence of this assumption, the joint probability of the full token sequence simplifies. Specifically, the probability Pr(x0,...,xm)=Pr(x0)Pr(x1,...,xmx0)\Pr(x_0,...,x_m) = \Pr(x_0)\Pr(x_1,...,x_m|x_0) reduces to Pr(x1,...,xmx0)\Pr(x_1,...,x_m|x_0) because the initial token's probability is 1{}1.

0

1

Updated 2026-04-18

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences