Learn Before
Formula

Gradient Descent Update Rule

The standard delta rule for gradient descent updates a model's parameters by taking a small step in the direction of the negative loss gradient. The new parameters θt+1\theta_{t+1} are obtained according to the formula: θt+1=θtlrLθt(Dmini)θt\theta_{t+1} = \theta_t - lr \cdot \frac{\partial L_{\theta_t}(\mathcal{D}_{\mathrm{mini}})}{\partial \theta_t}. In this equation, θt\theta_t represents the latest parameters, lrlr is the small step (learning rate), and the fractional term is the gradient of the loss function LL with respect to θt\theta_t, computed on a minibatch of training sample Dmini\mathcal{D}_{\mathrm{mini}}.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related