1Cademy - Hierarchical Softmax Formula

Learn Before

Hierarchical Softmax

Formula

Hierarchical Softmax Formula

The Hierarchical Softmax is an efficient alternative to the standard Softmax function for models with large output vocabularies. It works by partitioning the vocabulary into classes or 'nodes'. The probability of a specific item j, which belongs to node u, is calculated by normalizing its score against the scores of all other items across all nodes. The formula is expressed as:

$\alpha_{i,j} = \frac{\exp(\beta_{i,j})}{\sum_{\mathbf{k}_{j'} \in \mathbf{K}^{[1]}} \exp(\beta_{i,j'}) + \cdots + \sum_{\mathbf{k}_{j'} \in \mathbf{K}^{[u]}} \exp(\beta_{i,j'}) + \cdots + \sum_{\mathbf{k}_{j'} \in \mathbf{K}^{[n_u]}} \exp(\beta_{i,j'})}$

In this equation, the numerator represents the exponentiated score for item j. The denominator is the normalization term, calculated by summing the exponentiated scores of all items over all n_u partitions of the vocabulary.