Learn Before
Concept

Multivariate Gradient Descent

When the objective function maps a dd-dimensional vector x=[x1,x2,,xd]op\mathbf{x} = [x_1, x_2, \ldots, x_d]^ op to a scalar, i.e., f:RdoRf: \mathbb{R}^d o \mathbb{R}, its gradient becomes a vector of dd partial derivatives:

abla f(\mathbf{x}) = \left[\frac{\partial f(\mathbf{x})}{\partial x_1}, \frac{\partial f(\mathbf{x})}{\partial x_2}, \ldots, \frac{\partial f(\mathbf{x})}{\partial x_d} ight]^ op$$ Each component $$\partial f(\mathbf{x})/\partial x_i$$ captures the rate at which $$f$$ changes with respect to $$x_i$$ alone. Using the first-order multivariate Taylor expansion, $$f(\mathbf{x} + \boldsymbol{\epsilon}) = f(\mathbf{x}) + \boldsymbol{\epsilon}^ op abla f(\mathbf{x}) + \mathcal{O}(\|\boldsymbol{\epsilon}\|^2)$$ one can show that the steepest-descent direction (up to second-order terms) is the negative gradient $$- abla f(\mathbf{x})$$. Choosing a suitable learning rate $$\eta > 0$$ yields the multivariate gradient descent update rule: $$\mathbf{x} \leftarrow \mathbf{x} - \eta abla f(\mathbf{x})$$ This directly generalizes the scalar update $$x \leftarrow x - \eta f'(x)$$ to vector-valued parameters.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L