1Cademy - Negative Log-Likelihood Objective for Softmax Regression

Learn Before

Maximum Likelihood Estimation

Concept

Negative Log-Likelihood Objective for Softmax Regression

To optimize a classification model using maximum likelihood estimation, we compare our predicted conditional probabilities with the actual labels. Assuming the dataset's labels $\mathbf{Y}$ are independent given the features $\mathbf{X}$ , the probability of observing the correct labels is the product of individual probabilities:

$P(\mathbf{Y} \mid \mathbf{X}) = \prod_{i=1}^n P(\mathbf{y}^{(i)} \mid \mathbf{x}^{(i)})$

Because maximizing a product of many small probabilities is numerically unstable and computationally awkward, we take the negative logarithm. This transforms the problem into minimizing the negative log-likelihood, turning the product into a manageable sum of individual losses:

$-\log P(\mathbf{Y} \mid \mathbf{X}) = \sum_{i=1}^n -\log P(\mathbf{y}^{(i)} \mid \mathbf{x}^{(i)}) = \sum_{i=1}^n l(\mathbf{y}^{(i)}, \hat{\mathbf{y}}^{(i)})$

0

1

Updated 2026-05-03

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn After

Cross-Entropy Loss for Softmax Regression

Learn Before

Related

Learn After