Learn Before
Theory

Energy-Based View of Softmax

The softmax function has roots in statistical physics, closely mirroring the Boltzmann distribution where the prevalence of a thermodynamic state with energy EE and temperature TT is proportional to exp(E/kT)\exp(-E/kT). By treating the model's error (or negative logit) as energy, the softmax formulation naturally models a distribution over states, forming the conceptual basis for energy-based models in deep learning.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L