1Cademy - Parameterized Softmax Layer

Learn Before

Softmax Function

Formula

Parameterized Softmax Layer

A parameterized Softmax layer, denoted as $\mathrm{Softmax}_{\mathbf{W}}(\cdot)$ , incorporates a set of weights, $\mathbf{W}$ . This layer operates by first applying a linear transformation to the input hidden states, $\mathbf{H}$ , using the weight matrix $\mathbf{W}$ , and then passing the result through the standard Softmax function. This operation is formally defined by the equation: $\mathrm{Softmax}_{\mathbf{W}}(\mathbf{H}) = \mathrm{Softmax}(\mathbf{H} \cdot \mathbf{W})$ .