Learn Before
Concept
Finite Sample Distribution for Stochastic Gradient Descent
For a finite sample size , the empirical data distribution is modeled as a discrete probability distribution , where denotes the Dirac delta function. This discrete distribution theoretically justifies performing stochastic gradient descent over a finite dataset by drawing independent samples from it.
0
1
Updated 2026-05-15
Tags
D2L
Dive into Deep Learning @ D2L