Learn Before
Concept

Bounding Box Tensor Shape

In deep learning frameworks, multiple bounding boxes for a single image are programmatically represented as a two-dimensional tensor of shape (n, 4), where nn is the total number of bounding boxes. When processing a minibatch of images, this representation expands to a three-dimensional tensor of shape (b, n, 4), where bb represents the batch size. In both configurations, the final dimension of size 4 holds the numerical parameters defining each box, such as the two corner coordinates or the center position paired with width and height.

0

1

Updated 2026-06-14

Tags

D2L

Dive into Deep Learning @ D2L