1Cademy - Probability Distribution Formula for an Encoder-Softmax Language Model

Learn Before

Formula

Probability Distribution Formula for an Encoder-Softmax Language Model

When an encoder model parameterized by $\theta$ processes an input sequence $\mathbf{x}$ and is followed by a Softmax layer parameterized by a weight matrix $\mathbf{W}$ , it outputs a sequence of probability distributions. This operation is mathematically expressed as: $\begin{bmatrix} \mathbf{p}_1^{\mathbf{W},\theta} \\ \vdots \\ \mathbf{p}_m^{\mathbf{W},\theta} \end{bmatrix} = \mathrm{Softmax}_{\mathbf{W}}(\mathrm{Encoder}_{\theta}(\mathbf{x}))$ In this formula, each $\mathbf{p}_i^{\mathbf{W},\theta}$ represents the conditional output distribution $\Pr(\cdot|\mathbf{x})$ at sequence position $i$ . For notation simplicity, the superscripts $\mathbf{W}$ and $\theta$ affixed to each probability distribution are sometimes dropped.

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After