1Cademy - Probability Computation with Pre-trained Language Models

Learn Before

Pre-training Objective for Language Models

Concept

Probability Computation with Pre-trained Language Models

Once a pre-trained language model's parameters are optimized (denoted as $\hat{\theta}$ ), the decoder model, $\mathrm{Decoder}_{\hat{\theta}}(\cdot)$ , can be utilized to calculate the probability of a token appearing at any given position within a sequence. Specifically, it computes the conditional probability $\mathrm{Pr}_{\hat{\theta}}(x_{i+1} | x_0,...,x_i)$ for the next token $x_{i+1}$ based on the preceding context.