Learn Before
Objective Function Change Bounds in Gradient Descent
Assuming a sufficiently smooth objective function is Lipschitz continuous with constant (meaning that for any and , the objective satisfies ), the change in the objective value after a gradient descent update is bounded by the inequality . This bound demonstrates that the maximum change in the loss during a single step is constrained by the learning rate , the gradient norm , and the Lipschitz constant . A small value for this upper bound presents a trade-off: it limits the speed at which the objective value can be reduced, but it advantageously limits how much progress can go wrong or be undone in any single gradient step.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
Gradient Descent Reference
Linear Regression and Gradient Descent
Numerical Approximation of Gradients
Gradient Checking
(Batch) Gradient Descent (Deep Learning Optimization Algorithm)
Gradient Descent Explained
Why Gradient descent might fail?
A Chat with Andrew on MLOps: From Model-centric to Data-centric AI
Big Data to Good Data: Andrew Ng Urges ML Community To Be More Data-Centric and Less Model-Centric
MLOps: Data-centric and Model-centric approaches
Critical Points
First-order Optimization Algorithm
Method of Steepest Descent
Second-Order Gradient Methods
Gradient Descent Explanation
Gradient Descent Variants
Notes about gradient descent
Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?
Vanishing/exploding gradient
BERT Training Process
Objective Function
Distributed Training
The Problem with Constant Initialization
Objective Function Change Bounds in Gradient Descent
One-Dimensional Gradient Descent
Multivariate Gradient Descent
Second-Order Optimization Algorithm
Average Objective Function in Deep Learning
Accelerated Gradient Methods