Formula

Training Objective as Loss Minimization over a Dataset

The objective of training a model is to find the set of parameters, θ^\hat{\theta}, that minimizes the total loss across an entire dataset of sequences, D\mathcal{D}. This optimization problem is formally expressed as: θ^=arg minθxDLossθ(x)\hat{\theta} = \argmin_{\theta} \sum_{\mathbf{x} \in \mathcal{D}} \mathrm{Loss}_{\theta}(\mathbf{x}) This loss minimization objective is mathematically equivalent to the principle of Maximum Likelihood Estimation (MLE). Specifically, when the loss function is defined as the negative log-likelihood of the data, minimizing the loss is the same as maximizing the likelihood of the data given the model parameters.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Related