Learn Before
Concept

Sample Calculation of Softmax Output Layer

Say we have z[L]=[2137]z^{[L]} = \begin{bmatrix} 2 \\ -1 \\ 3 \\ 7 \end{bmatrix}.

Then, t=[e2e1e3e7]=[7.390.3720.11096.6]t = \begin{bmatrix} e^2 \\ e^{-1} \\ e^3 \\ e^7 \end{bmatrix} = \begin{bmatrix} 7.39 \\ 0.37 \\ 20.1 \\ 1096.6 \end{bmatrix}.

We sum the entries in tt to yield the denominator of the activation: $7.39 + 0.37 + 20.1 + 1096.6 = 1124.5$.

Finally, normalize the probabilities by dividing each entry in tt by the summation we just computed: a[L]=[7.39/1124.50.37/1124.520.1/1124.51096.6/1124.5]=[0.0070.00030.0297.5]a^{[L]} = \begin{bmatrix} 7.39/1124.5 \\ 0.37/1124.5 \\ 20.1/1124.5 \\ 1096.6/1124.5 \end{bmatrix} = \begin{bmatrix} 0.007 \\ 0.0003 \\ 0.02 \\ 97.5 \end{bmatrix}

0

1

Updated 2020-11-09

Tags

Data Science