Learn Before
Concept

Constraining MLPs for Images

When applying a multi-layer perceptron (MLP) to two-dimensional images, both the inputs X\mathbf{X} and the hidden representations H\mathbf{H} can be treated as matrices with spatial structure. To allow every hidden unit to receive input from every pixel, the network's parameters are represented as a fourth-order weight tensor W\mathsf{W} and a bias matrix U\mathbf{U}. The fully connected layer is formally expressed as:

[H]i,j=[U]i,j+kl[W]i,j,k,l[X]k,l=[U]i,j+ab[V]i,j,a,b[X]i+a,j+b[\mathbf{H}]_{i, j} = [\mathbf{U}]_{i, j} + \sum_k \sum_l[\mathsf{W}]_{i, j, k, l} [\mathbf{X}]_{k, l} = [\mathbf{U}]_{i, j} + \sum_a \sum_b [\mathsf{V}]_{i, j, a, b} [\mathbf{X}]_{i+a, j+b}

where [V]i,j,a,b=[W]i,j,i+a,j+b[\mathsf{V}]_{i, j, a, b} = [\mathsf{W}]_{i, j, i+a, j+b}. A single fully connected layer mapping a 1000×10001000 \times 1000 pixel image to a hidden representation of the same size using this parametrization requires 101210^{12} parameters, which is computationally intractable.

0

1

Updated 2026-05-09

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Related
Learn After