1Cademy - Parameter Estimation via Conditional Log-Likelihood Maximization

Learn Before

Maximum Likelihood Estimation

Formula

Parameter Estimation via Conditional Log-Likelihood Maximization

In the context of training a Large Language Model (LLM), the optimal parameters, denoted as $\hat{\theta}$ , are found by maximizing the conditional log-likelihood across a dataset $D$ . This supervised learning objective involves finding the parameters $\theta$ that maximize the sum of the logarithmic probabilities of the true outputs $y$ given the inputs $x$ , where the probability $\text{Pr}_{\theta}(y|x)$ is predicted by the LLM. The formula is expressed as: $\hat{\theta} = \underset{\theta}{\arg\max} \sum_{(x,y) \in D} \log \text{Pr}_{\theta}(y|x)$ In some contexts, the input $x$ can be represented by other variables, such as a context $c$ and a latent variable $z$ , leading to an equivalent formulation: $\hat{\theta} = \underset{\theta}{\arg\max} \sum_{(x,y) \in D} \log \text{Pr}_{\theta}(y|c, z)$

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Michigan State University

✔️ 2

References

Learn Before

Related

Learn After