Learn Before
Formula

Parameterized Softmax Layer

A parameterized Softmax layer, denoted as SoftmaxW()\mathrm{Softmax}_{\mathbf{W}}(\cdot), incorporates a set of weights, W\mathbf{W}. This layer operates by first applying a linear transformation to the input hidden states, H\mathbf{H}, using the weight matrix W\mathbf{W}, and then passing the result through the standard Softmax function. This operation is formally defined by the equation: SoftmaxW(H)=Softmax(HW)\mathrm{Softmax}_{\mathbf{W}}(\mathbf{H}) = \mathrm{Softmax}(\mathbf{H} \cdot \mathbf{W}).

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related