1Cademy - SGD Optimizer From-Scratch Implementation

Learn Before

Stochastic Gradient Descent Algorithm

Code

SGD Optimizer From-Scratch Implementation

A minimal from-scratch implementation of the stochastic gradient descent optimizer defines a function sgd(params, states, hyperparams) that accepts three arguments: a list of model parameters, optimizer states (unused for vanilla SGD), and a dictionary of hyperparameters. For each parameter tensor, the function subtracts the product of the learning rate and the parameter's gradient using an in-place operation, then zeroes the gradient. In PyTorch:

python def sgd(params, states, hyperparams): for p in params: p.data.sub_(hyperparams['lr'] * p.grad) p.grad.data.zero_()

This function signature—taking params, states, and hyperparams—is deliberately general so that more advanced optimizers introduced later (e.g., momentum, Adam) can share the same calling convention by making use of the states argument for maintaining auxiliary variables.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related