Formula

Sample-wise Negative Log-Likelihood Loss for a Sub-sequence

When evaluating a model on a specific training instance, the loss function is calculated solely over the target sub-sequence, ysample\mathbf{y}_{\mathrm{sample}}, instead of the full sequence. For a model defined by parameters θ^+\hat{\theta}^+, this loss is expressed as the negative log-likelihood of the probability of generating the output sub-sequence, given the input sub-sequence xsample\mathbf{x}_{\mathrm{sample}}. The formula is:

Lθ^+(sample)=logPrθ^+(ysamplexsample)\mathcal{L}_{\hat{\theta}^+}(\mathrm{sample}) = -\log \mathrm{Pr}_{\hat{\theta}^+}(\mathbf{y}_{\mathrm{sample}}|\mathbf{x}_{\mathrm{sample}})

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.3 Prompting - Foundations of Large Language Models

Related