Learn Before
Concept

Mask R-CNN Pixel-Level Prediction

In the mask R-CNN architecture, the uniformly shaped feature maps output by the region of interest (RoI) alignment layer are utilized for multiple parallel tasks. While they continue to predict the class and bounding box for each region of interest, they are simultaneously passed into an additional fully convolutional network. This fully convolutional network leverages the preserved spatial details to predict the exact pixel-level position of the object.

0

1

Updated 2026-05-21

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L