Learn Before
Naive Implementation of the Softmax Function
A basic implementation of the softmax function involves exponentiating the input tensor X to get X_exp, computing the partition function by summing along the appropriate axis while retaining dimensions, and dividing the exponentiated tensor by the partition function utilizing broadcasting. For instance, a naive implementation from scratch could be written as:
def softmax(X): X_exp = torch.exp(X) partition = X_exp.sum(1, keepdims=True) return X_exp / partition
While this naive code clearly illustrates the mathematical steps, it is not robust against very large or very small input arguments X, which can lead to numerical overflow or underflow. Therefore, in practice, it is standard to use the built-in, numerically stable softmax functions provided by deep learning frameworks.
0
1
Tags
D2L
Dive into Deep Learning @ D2L