Formula

Pre-trained Language Model Decoder Inference

Once an autoregressive language model has been optimized via Maximum Likelihood Estimation to find parameters θ^\hat{\theta}, the pre-trained model, denoted as Decoderθ^()\mathrm{Decoder}_{\hat{\theta}}(\cdot), can be used to compute the conditional probability Prθ^(xi+1x0,...,xi)\mathrm{Pr}_{\hat{\theta}}(x_{i+1} | x_0,...,x_i) for the next token at each position within a given sequence.

0

1

Updated 2026-04-15

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences