1Cademy - Vectorized Minibatch Softmax Regression

Learn Before

Softmax Function
Softmax Regression (Activation)
Batched Softmax Function

Concept

Vectorized Minibatch Softmax Regression

To maximize computational efficiency, the forward pass of a softmax regression model is typically vectorized across minibatches. For a minibatch of inputs $\mathbf{X} \in \mathbb{R}^{n imes d}$ containing $n$ examples with $d$ features, and parameters $\mathbf{W} \in \mathbb{R}^{d imes q}$ (weights) and $\mathbf{b} \in \mathbb{R}^{1 imes q}$ (biases), the unnormalized logits are computed using the affine transformation $\mathbf{O} = \mathbf{X} \mathbf{W} + \mathbf{b}$ . The softmax function is then applied rowwise to $\mathbf{O}$ to yield the normalized class probabilities $\hat{\mathbf{Y}} = \mathrm{softmax}(\mathbf{O})$ for the entire batch simultaneously.