1Cademy - Softmax Function

Learn Before

Formula

Softmax Function

To convert raw, unnormalized outputs $\mathbf{o}$ into valid probabilities, the softmax function applies an exponential function to each component and then normalizes them by their sum. The exponentiation ensures that all probabilities are non-negative, while the division ensures that they sum to $1$ . Mathematically, the predicted probability distribution $\hat{\mathbf{y}}$ is defined as:

$\hat{\mathbf{y}} = \mathrm{softmax}(\mathbf{o}) \quad extrm{where}\quad \hat{y}_i = \frac{\exp(o_i)}{\sum_j \exp(o_j)}$

This guarantees that $0 \le \hat{y}_i \le 1$ and $\sum_j \hat{y}_j = 1$ . Unlike other normalizations or the probit model, the softmax function preserves order and leads to a more well-behaved optimization problem.