Learn Before
Concept

Residual Mapping

In a residual network, the desired underlying mapping f(x)f(\mathbf{x})—the function the network ultimately aims to approximate—is not learned directly by a stack of layers. Instead, those layers are reformulated to learn only the residual mapping g(x)=f(x)xg(\mathbf{x}) = f(\mathbf{x}) - \mathbf{x}, and the target function is recovered as f(x)=g(x)+xf(\mathbf{x}) = g(\mathbf{x}) + \mathbf{x}. This reformulation is motivated by the degradation problem: as plain networks grow deeper, their training accuracy can paradoxically worsen, suggesting that the added layers struggle to approximate even the identity function. By recasting the problem in terms of g(x)g(\mathbf{x}), the identity case f(x)=xf(\mathbf{x}) = \mathbf{x} reduces to g(x)=0g(\mathbf{x}) = 0, which is significantly easier for a network to learn because it only requires driving the weights and biases of the constituent layers toward zero.

0

1

Updated 2026-05-18

Tags

Data Science

D2L

Dive into Deep Learning @ D2L