Learn Before
Formula

Loss Gradient over a Mini-batch

The expression Lθt(Dmini)θt\frac{\partial L_{\theta_t}(\mathcal{D}_{\mathrm{mini}})}{\partial \theta_t} represents the gradient of the loss function, LL, with respect to the model parameters, θt\theta_t. This gradient is computed on a specific mini-batch of training samples, Dmini\mathcal{D}_{\mathrm{mini}}, and indicates the direction of the steepest increase in the loss for that batch.

Image 0

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences