Example

Batch GD Slow Convergence on the Airfoil Dataset

When batch gradient descent is applied to the Airfoil Self-Noise dataset by setting the minibatch size equal to the total number of training examples (1,5001{,}500), the model parameters are updated only once per epoch. With a learning rate of 11 over 1010 epochs, the loss converges to approximately 0.2470.247 at a speed of about 0.0200.020 seconds per epoch. However, progress is minimal: after roughly 66 parameter updates, the loss curve plateaus and further improvement stalls. This demonstrates the fundamental limitation of full-batch gradient descent—each epoch provides only a single update, so achieving fine convergence requires many epochs despite each individual epoch being fast to compute.

Image 0

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L