Comparison

Comparison of Output Probability Meaning: Language Modeling vs. Encoder Pre-training

The interpretation of the output probability distribution, pi\mathbf{p}_i, differs significantly between standard language models and encoder pre-training contexts. In standard language modeling, which employs an auto-regressive decoding process, pi\mathbf{p}_i represents the probability of predicting the next word, given that the model only observes preceding tokens up to position ii. By contrast, during encoder pre-training, the model has simultaneous access to the entire input sequence at once, making it nonsensical to predict any of the observed tokens in the sequence.

0

1

Updated 2026-04-15

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences