Definition

Notational Convention for Autoregressive Conditional Probability

In autoregressive models, the notation for the conditional probability of a token, Pr(y_i|x, y_{<i}), is a common shorthand. It signifies the probability of token y_i conditioned on the single sequence formed by concatenating the input x with the preceding output tokens y_{<i}. A more explicit, but less frequently used, notation for this is Pr(y_i|[x, y_{<i}]), where [x, y_{<i}] represents the full context for the prediction.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences