Learn Before
One-Dimensional Gradient Descent
One-dimensional gradient descent provides a clear illustration of why moving in the negative gradient direction reduces the objective function. For a continuously differentiable function , the first-order Taylor expansion gives . Setting the step as , where is a fixed learning rate, yields . When the derivative , the term guarantees a decrease in , provided is small enough for the higher-order terms to be negligible. This leads to the update rule , which is applied iteratively from an initial value until a stopping condition is met, such as when the gradient magnitude becomes sufficiently small or a maximum number of iterations is reached.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
Gradient Descent Reference
Linear Regression and Gradient Descent
Numerical Approximation of Gradients
Gradient Checking
(Batch) Gradient Descent (Deep Learning Optimization Algorithm)
Gradient Descent Explained
Why Gradient descent might fail?
A Chat with Andrew on MLOps: From Model-centric to Data-centric AI
Big Data to Good Data: Andrew Ng Urges ML Community To Be More Data-Centric and Less Model-Centric
MLOps: Data-centric and Model-centric approaches
Critical Points
First-order Optimization Algorithm
Method of Steepest Descent
Second-Order Gradient Methods
Gradient Descent Explanation
Gradient Descent Variants
Notes about gradient descent
Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?
Vanishing/exploding gradient
BERT Training Process
Objective Function
Distributed Training
The Problem with Constant Initialization
Objective Function Change Bounds in Gradient Descent
One-Dimensional Gradient Descent
Multivariate Gradient Descent
Second-Order Optimization Algorithm
Average Objective Function in Deep Learning
Accelerated Gradient Methods