Formula

Batched Softmax Function

The softmax function maps a matrix of raw scalar outputs, denoted as X\mathbf{X}, into a valid probability distribution. It transforms each element into a non-negative number and ensures that each row sums to 11. Computing the softmax involves three steps: first, exponentiating each term; second, computing a normalization constant by summing over each row; and third, dividing each row by its corresponding normalization constant. The mathematical formula is expressed as:

softmax(X)ij=exp(Xij)kexp(Xik)\mathrm{softmax}(\mathbf{X})_{ij} = \frac{\exp(\mathbf{X}_{ij})}{\sum_k \exp(\mathbf{X}_{ik})}

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L