Learn Before
Formula
Batched Softmax Function
The softmax function maps a matrix of raw scalar outputs, denoted as , into a valid probability distribution. It transforms each element into a non-negative number and ensures that each row sums to . Computing the softmax involves three steps: first, exponentiating each term; second, computing a normalization constant by summing over each row; and third, dividing each row by its corresponding normalization constant. The mathematical formula is expressed as:
0
1
Updated 2026-05-03
Tags
D2L
Dive into Deep Learning @ D2L