Formula

Maximum Likelihood Estimation for Sequential Data

In the context of sequential data, maximum likelihood estimation aims to find the optimal language model parameters θ^\hat{\theta} by maximizing the total sequence-level log-likelihood across a given dataset D\mathcal{D}. This objective of maximum likelihood training is formally defined as: θ^=arg maxθxDLθ(x)\hat{\theta} = \argmax_{\theta} \sum_{\mathbf{x} \in \mathcal{D}} \mathcal{L}_{\theta}(\mathbf{x}), where Lθ(x)\mathcal{L}_{\theta}(\mathbf{x}) represents the sum of the conditional log-probabilities for an individual complete sequence.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.2 Generative Models - Foundations of Large Language Models

Related