1Cademy - Log-Likelihood Objective for Language Model Training

Learn Before

Formula

Log-Likelihood Objective for Language Model Training

The log-likelihood of a sequence $\mathbf{x} = (x_0, \dots, x_m)$ is computed by summing the log-probabilities of predicting each token given its predecessors. This method is derived from the chain rule of probability. The full expression is $\log \text{Pr}(\mathbf{x}) = \log \text{Pr}(x_0) + \sum_{j=1}^{m} \log \text{Pr}(x_j|\mathbf{x}_{<j})$ . For practical training purposes, the probability of the initial token, $\text{Pr}(x_0)$ , is often assumed to be 1 (making its log-probability 0), especially when it's a fixed start-of-sequence symbol. This simplifies the objective to summing only the conditional log-probabilities for the remaining tokens: $\mathcal{L}_{\theta}(\mathbf{x}) = \sum_{j=1}^{m} \log \text{Pr}_{\theta}(x_j|\mathbf{x}_{<j})$ In short, the process involves calculating the token prediction log-probability at each position in the sequence and then adding these values together.

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After