Formula

Cross-Entropy Loss for Softmax Regression

For a pair of a one-hot label vector y\mathbf{y} and a model's predicted probability distribution y^\hat{\mathbf{y}} over qq classes, the cross-entropy loss function is defined as:

l(y,y^)=j=1qyjlogy^jl(\mathbf{y}, \hat{\mathbf{y}}) = - \sum_{j=1}^q y_j \log \hat{y}_j

Because y\mathbf{y} is a one-hot vector, the sum vanishes for all but the coordinate corresponding to the true class. This loss is bounded below by 00 (since probabilities cannot exceed 11 and their negative logarithm cannot be lower than 00), and it only equals 00 if the model predicts the true label with absolute certainty. However, reaching a probability of exactly 11 requires infinite logits, so the loss is never completely 00 for finite weights. Conversely, assigning an output probability of 00 to the true label would incur an infinite loss (log0=-\log 0 = \infty).

0

1

Updated 2026-05-03

Tags

Data Science

D2L

Dive into Deep Learning @ D2L