1Cademy - Maximum Likelihood Estimation for Sequential Data

Learn Before

Maximum Likelihood Training Objective for a Dataset of Sequences
Mathematical Equivalence of General and Sequential MLE Objectives

Formula

Maximum Likelihood Estimation for Sequential Data

In the context of sequential data, maximum likelihood estimation aims to find the optimal language model parameters $\hat{\theta}$ by maximizing the total sequence-level log-likelihood across a given dataset $\mathcal{D}$ . This objective of maximum likelihood training is formally defined as: $\hat{\theta} = \argmax_{\theta} \sum_{\mathbf{x} \in \mathcal{D}} \mathcal{L}_{\theta}(\mathbf{x})$ , where $\mathcal{L}_{\theta}(\mathbf{x})$ represents the sum of the conditional log-probabilities for an individual complete sequence.