Learn Before
Formula

Parameter Estimation via Conditional Log-Likelihood Maximization

In the context of training a Large Language Model (LLM), the optimal parameters, denoted as θ^\hat{\theta}, are found by maximizing the conditional log-likelihood across a dataset DD. This supervised learning objective involves finding the parameters θ\theta that maximize the sum of the logarithmic probabilities of the true outputs yy given the inputs xx, where the probability Prθ(yx)\text{Pr}_{\theta}(y|x) is predicted by the LLM. The formula is expressed as: θ^=argmaxθ(x,y)DlogPrθ(yx)\hat{\theta} = \underset{\theta}{\arg\max} \sum_{(x,y) \in D} \log \text{Pr}_{\theta}(y|x) In some contexts, the input xx can be represented by other variables, such as a context cc and a latent variable zz, leading to an equivalent formulation: θ^=argmaxθ(x,y)DlogPrθ(yc,z)\hat{\theta} = \underset{\theta}{\arg\max} \sum_{(x,y) \in D} \log \text{Pr}_{\theta}(y|c, z)

Image 0

0

1

Updated 2025-10-08

Tags

Data Science

Foundations of Large Language Models Course

Computing Sciences

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models