Concept

Probability Computation with Pre-trained Language Models

Once a pre-trained language model's parameters are optimized (denoted as θ^\hat{\theta}), the decoder model, Decoderθ^()\mathrm{Decoder}_{\hat{\theta}}(\cdot), can be utilized to calculate the probability of a token appearing at any given position within a sequence. Specifically, it computes the conditional probability Prθ^(xi+1x0,...,xi)\mathrm{Pr}_{\hat{\theta}}(x_{i+1} | x_0,...,x_i) for the next token xi+1x_{i+1} based on the preceding context.

0

1

Updated 2026-04-15

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences