Learn Before
Concept

Negative Log-Likelihood Objective for Softmax Regression

To optimize a classification model using maximum likelihood estimation, we compare our predicted conditional probabilities with the actual labels. Assuming the dataset's labels Y\mathbf{Y} are independent given the features X\mathbf{X}, the probability of observing the correct labels is the product of individual probabilities:

P(YX)=i=1nP(y(i)x(i))P(\mathbf{Y} \mid \mathbf{X}) = \prod_{i=1}^n P(\mathbf{y}^{(i)} \mid \mathbf{x}^{(i)})

Because maximizing a product of many small probabilities is numerically unstable and computationally awkward, we take the negative logarithm. This transforms the problem into minimizing the negative log-likelihood, turning the product into a manageable sum of individual losses:

logP(YX)=i=1nlogP(y(i)x(i))=i=1nl(y(i),y^(i))-\log P(\mathbf{Y} \mid \mathbf{X}) = \sum_{i=1}^n -\log P(\mathbf{y}^{(i)} \mid \mathbf{x}^{(i)}) = \sum_{i=1}^n l(\mathbf{y}^{(i)}, \hat{\mathbf{y}}^{(i)})

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L