Concept

Mini-Batch Gradient Descent

Batch Gradient Descent requires the entire dataset to be processed to complete one step. Gradient Descent can require many steps in some cases which causes this procedure to be very slow and inefficient for large datasets.

Mini-Batch Gradient Descent solves this problem by taking small groups of random data points called mini-batches and using them to estimate each step. The mini-batches are usually a size greater than 1 but less than N (dataset size). The smaller steps allow for a much faster algorithm.

Image 0

0

2

Updated 2026-05-02

Tags

Data Science

Foundations of Large Language Models Course

Computing Sciences

D2L

Dive into Deep Learning @ D2L