Concept

Computational Cost of Gradient Descent

When using standard gradient descent, the computational cost for each parameter update iteration is O(n)\mathcal{O}(n), where nn is the number of examples in the training dataset. Because the full gradient computation requires evaluating the gradient of the loss function for every example, the cost grows linearly with the dataset size nn. Consequently, standard gradient descent becomes highly expensive per iteration when applied to very large training datasets.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L