Concept

Translation Invariance

The principle of translation invariance in computer vision asserts that a network should recognize objects regardless of their location in an image. When applied to constrain a multi-layer perceptron (MLP), this principle dictates that a shift in the input X\mathbf{X} must lead to an identical shift in the hidden representation H\mathbf{H}. Consequently, the weight tensor V\mathsf{V} and bias U\mathbf{U} cannot depend on the absolute spatial coordinates (i,j)(i, j). Using a constant bias uu and a shared set of weights [V]a,b[\mathbf{V}]_{a, b}, the hidden representation simplifies to:

[H]i,j=u+ab[V]a,b[X]i+a,j+b[\mathbf{H}]_{i, j} = u + \sum_a\sum_b [\mathbf{V}]_{a, b} [\mathbf{X}]_{i+a, j+b}

This weight sharing dramatically reduces the parameter count (e.g., from 101210^{12} to 4×1064 \times 10^6 for a 1-megapixel image) and effectively forms a convolution.

0

1

Updated 2026-05-09

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Related