Batch GD Slow Convergence on the Airfoil Dataset
When batch gradient descent is applied to the Airfoil Self-Noise dataset by setting the minibatch size equal to the total number of training examples (), the model parameters are updated only once per epoch. With a learning rate of over epochs, the loss converges to approximately at a speed of about seconds per epoch. However, progress is minimal: after roughly parameter updates, the loss curve plateaus and further improvement stalls. This demonstrates the fundamental limitation of full-batch gradient descent—each epoch provides only a single update, so achieving fine convergence requires many epochs despite each individual epoch being fast to compute.
0
1
Tags
D2L
Dive into Deep Learning @ D2L