Learn Before
Example

Example Calculation of Softmax Output Layer

Say we have z[L]=[2137]z^{[L]} = \begin{bmatrix} 2 -1 3 7 \end{bmatrix}. Then, t=[e2e1e3e7][7.390.3720.11096.6]t = \begin{bmatrix} e^2 e^{-1} e^3 e^7 \end{bmatrix} \approx \begin{bmatrix} 7.39 0.37 20.1 1096.6 \end{bmatrix}. We sum the entries in tt to yield the denominator of the activation: 7.39+0.37+20.1+1096.6=1124.467.39 + 0.37 + 20.1 + 1096.6 = 1124.46. Finally, normalize the probabilities by dividing each entry in tt by the summation we just computed: a[L]=[7.39/1124.460.37/1124.4620.1/1124.461096.6/1124.46][0.0070.00030.0180.975]a^{[L]} = \begin{bmatrix} 7.39/1124.46 0.37/1124.46 20.1/1124.46 1096.6/1124.46 \end{bmatrix} \approx \begin{bmatrix} 0.007 0.0003 0.018 0.975 \end{bmatrix}.

0

1

Updated 2026-06-16

Tags

Data Science