1Cademy - Pre-trained Language Model Decoder Inference

Learn Before

Maximum Likelihood Estimation for Sequential Data

Formula

Pre-trained Language Model Decoder Inference

Once an autoregressive language model has been optimized via Maximum Likelihood Estimation to find parameters $\hat{\theta}$ , the pre-trained model, denoted as $\mathrm{Decoder}_{\hat{\theta}}(\cdot)$ , can be used to compute the conditional probability $\mathrm{Pr}_{\hat{\theta}}(x_{i+1} | x_0,...,x_i)$ for the next token at each position within a given sequence.

Updated 2026-04-15

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related