Formula

Maximum Likelihood Training Objective for a Dataset of Sequences

The training objective under the Maximum Likelihood Estimation (MLE) framework is to find the model parameters, θ~\tilde{\theta}, that maximize the total log-probability of all sequences in a dataset D\mathcal{D}. This is achieved by summing the log-probabilities of each individual sequence, seq, as calculated by the model parameterized by θ\theta. The general objective is formally expressed as: θ~=argmaxθseqDlogPrθ(seq)\tilde{\theta} = \arg\max_{\theta} \sum_{\text{seq} \in \mathcal{D}} \log \text{Pr}_{\theta}(\text{seq}) For datasets composed of input-output pairs (x,y)(\mathbf{x}, \mathbf{y}), this objective can be specified as maximizing the joint log-probability of the concatenated sequences: θ~=argmaxθ(x,y)DlogPrθ(seqx,y)\tilde{\theta} = \underset{\theta}{\arg\max} \sum_{(\mathbf{x},\mathbf{y})\in\mathcal{D}} \log \text{Pr}_{\theta}(\text{seq}_{\mathbf{x},\mathbf{y}}) This approach is equivalent to maximizing the sum of the log-likelihoods for all data points in the training set.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.2 Generative Models - Foundations of Large Language Models

Ch.4 Alignment - Foundations of Large Language Models

Related