Learn Before
Concept

Scalability in Vision Transformers

When trained on massive datasets, such as those with hundreds of millions of images, Vision Transformers demonstrate intrinsic superiority in scalability over convolutional architectures like ResNets. In these large-scale scenarios, Vision Transformers outperform ResNets by a significant margin in image classification, proving that scalability and model capacity can trump the need for built-in spatial inductive biases.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Related