1Cademy - Input Cropping Divisibility in Fully Convolutional Networks

Learn Before

Fixed-Shape Cropping in Semantic Segmentation

Concept

Input Cropping Divisibility in Fully Convolutional Networks

In a Fully Convolutional Network (FCN), input images are processed using random fixed-shape cropping to maintain exact pixel correspondence for semantic segmentation tasks. To ensure that the spatial dimensions of the network's final output perfectly match the input crop after a series of downsampling and upsampling operations, the selected crop dimensions must be exactly divisible by the network's total downsampling factor. For example, if the feature extraction backbone reduces the spatial dimensions by a factor of $32$ , the height and width of the randomly cropped inputs (such as $320 \times 480$ ) must both be exactly divisible by $32$ . This prevents spatial mismatch errors when the transposed convolutional layer subsequently upsamples the feature maps back to the original crop size.

# PyTorch
batch_size, crop_size = 32, (320, 480)
train_iter, test_iter = d2l.load_data_voc(batch_size, crop_size)

# MXNet
batch_size, crop_size = 32, (320, 480)
train_iter, test_iter = d2l.load_data_voc(batch_size, crop_size)

Updated 2026-05-21

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn After

Overlapping Crop Prediction in Fully Convolutional Networks

Learn Before

Related

Learn After