1Cademy - Gradient Descent Update Rule

Learn Before

Data Parallelism

Gradient Descent Update Rule

The standard delta rule for gradient descent updates a model's parameters by moving them in the direction opposite to the gradient of the loss function. The update is performed according to the formula: $\theta_{t+1} = \theta_t - lr \cdot \frac{\partial L_{\theta_t}(D_{\text{mini}})}{\partial \theta_t}$ . In this equation, $\theta_{t+1}$ are the updated parameters, $\theta_t$ are the parameters at the current step, $lr$ is the learning rate, and the fractional term represents the gradient of the loss function $L$ with respect to the parameters $\theta_t$ , computed on a mini-batch of data $D_{\text{mini}}$ .

0

1

3 days ago

Contributors are:

Who are from:

References

Learn Before

Related