Learn Before
Concept

Fast R-CNN Architecture Pipeline

The Fast R-CNN model processes an image through four major computational steps. First, a trainable CNN extracts features from the entire image, outputting a feature map of shape 1imescimesh1imesw11 imes c imes h_1 imes w_1. Second, selective search generates nn region proposals, which are mapped as regions of interest on the CNN output. A region of interest (RoI) pooling layer then extracts concatenated features of a uniform shape nimescimesh2imesw2n imes c imes h_2 imes w_2 from these proposals. Third, a fully connected layer transforms these features into a matrix of shape nimesdn imes d. Finally, the output is transformed into a shape of nimesqn imes q for object class prediction using softmax regression (where qq is the number of classes) and a shape of nimes4n imes 4 for bounding box prediction.

Image 0

0

1

Updated 2026-05-21

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Related
Learn After