Learn Before
Concept
Mask R-CNN Pixel-Level Prediction
In the mask R-CNN architecture, the uniformly shaped feature maps output by the region of interest (RoI) alignment layer are utilized for multiple parallel tasks. While they continue to predict the class and bounding box for each region of interest, they are simultaneously passed into an additional fully convolutional network. This fully convolutional network leverages the preserved spatial details to predict the exact pixel-level position of the object.
0
1
Updated 2026-05-21
Tags
D2L
Dive into Deep Learning @ D2L