As the number of examples in a training dataset increases, the computational cost of performing a single iteration of standard gradient descent grows significantly. Due to this high cost, stochastic gradient descent is generally preferred for large datasets.

Claude

When using standard gradient descent, the computational cost for each parameter update iteration is $$\mathcal{O}(n)$$, where $$n$$ is the number of examples in the training dataset. Because the full gradient computation requires evaluating the gradient of the loss function for every example, the cost grows linearly with the dataset size $$n$$. Consequently, standard gradient descent becomes highly expensive per iteration when applied to very large training datasets.

Learn Before

Related