Concept

Finite Sample Distribution for Stochastic Gradient Descent

For a finite sample size nn, the empirical data distribution is modeled as a discrete probability distribution p(x,y)=1ni=1nδxi(x)δyi(y)p(x, y) = \frac{1}{n} \sum_{i=1}^n \delta_{x_i}(x) \delta_{y_i}(y), where δ\delta denotes the Dirac delta function. This discrete distribution theoretically justifies performing stochastic gradient descent over a finite dataset by drawing independent samples (xi,yi)(x_i, y_i) from it.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L