Formula

Applying Log-Likelihood Calculation to a Training Dataset

The log-likelihood of a sequence x\mathbf{x} is computed by aggregating the log-probabilities of each token conditioned on its preceding context. This sequence-level computation is formally expressed as Lθ(x)=i=1mlogPrθ(xix0,...,xi1)\mathcal{L}_{\theta}(\mathbf{x}) = \sum_{i=1}^{m} \log \mathrm{Pr}_{\theta}(x_i|x_0,...,x_{i-1}), where the subscript θ\theta affixed to both L()\mathcal{L}(\cdot) and Pr()\mathrm{Pr}(\cdot) denotes the parameters of the language model. This metric provides a foundation for optimizing the model across a training dataset.

0

1

Updated 2026-04-19

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences