Formula

Output Shape of Patch Embedding in Vision Transformers

When applying a patch embedding operation to an input image with a height and width of extimg_size ext{img\_size}, using a specific extpatch_size ext{patch\_size}, the resulting sequence will contain (extimg_size//extpatch_size)2( ext{img\_size} // ext{patch\_size})^2 patches. Each of these patches is then linearly projected into a vector of a fixed length, commonly denoted as extnum_hiddens ext{num\_hiddens}.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L