Learn Before
Concept

Why we want to minimize cross-entropy loss?

  1. A perfect classifier would assign probability 1 to the correct outcome (y=1 or y=0) and probability 0 to the incorrect outcome. That means if y equals 1, the higher ˆy is (the closer it is to 1), the better the classifier; the lower y^ is (the closer it is to 0), the worse the classifier. If y equals 0, instead, the higher 1 − y^ is (closer to 1), the better the classifier. The negative log of ˆy (if the true y equals 1) or 1−y^ (if the true y equals 0) is a convenient loss metric since it goes from 0 (negative log of 1, no loss) to infinity (negative log of 0, infinite loss).

  2. This loss function also ensures that as the probability of the correct answer is maximized, the probability of the incorrect answer is minimized; since the two sum to one, any increase in the probability of the correct answer is coming at the expense of the incorrect answer.

0

1

Updated 2022-07-05

Tags

Deep Learning

Data Science