Learn Before
Input Cropping Divisibility in Fully Convolutional Networks
In a Fully Convolutional Network (FCN), input images are processed using random fixed-shape cropping to maintain exact pixel correspondence for semantic segmentation tasks. To ensure that the spatial dimensions of the network's final output perfectly match the input crop after a series of downsampling and upsampling operations, the selected crop dimensions must be exactly divisible by the network's total downsampling factor. For example, if the feature extraction backbone reduces the spatial dimensions by a factor of , the height and width of the randomly cropped inputs (such as ) must both be exactly divisible by . This prevents spatial mismatch errors when the transposed convolutional layer subsequently upsamples the feature maps back to the original crop size.
# PyTorch batch_size, crop_size = 32, (320, 480) train_iter, test_iter = d2l.load_data_voc(batch_size, crop_size)
# MXNet batch_size, crop_size = 32, (320, 480) train_iter, test_iter = d2l.load_data_voc(batch_size, crop_size)
0
1
Tags
D2L
Dive into Deep Learning @ D2L