Learn Before
Concept
Computational Cost of Gradient Descent
When using standard gradient descent, the computational cost for each parameter update iteration is , where is the number of examples in the training dataset. Because the full gradient computation requires evaluating the gradient of the loss function for every example, the cost grows linearly with the dataset size . Consequently, standard gradient descent becomes highly expensive per iteration when applied to very large training datasets.
0
1
Updated 2026-05-15
Tags
D2L
Dive into Deep Learning @ D2L