1Cademy - Comparison of Output Probability Meaning: Language Modeling vs. Encoder Pre-training

Learn Before

Probability Distribution Formula for an Encoder-Softmax Language Model

Comparison

Comparison of Output Probability Meaning: Language Modeling vs. Encoder Pre-training

The interpretation of the output probability distribution, $\mathbf{p}_i$ , differs significantly between standard language models and encoder pre-training contexts. In standard language modeling, which employs an auto-regressive decoding process, $\mathbf{p}_i$ represents the probability of predicting the next word, given that the model only observes preceding tokens up to position $i$ . By contrast, during encoder pre-training, the model has simultaneous access to the entire input sequence at once, making it nonsensical to predict any of the observed tokens in the sequence.

0

1

Updated 2026-04-15

Contributors are:

Who are from:

References

Learn Before

Related

Learn After