Learn Before
Comparison
Comparing CNN, RNN, and Self-Attention Architectures
When evaluating architectures such as CNNs, RNNs, and self-attention for mapping an input sequence of tokens to an output sequence of the same length (with each token represented as a -dimensional vector), three main properties are compared: computational complexity, sequential operations, and maximum path lengths. A smaller number of sequential operations is desirable as it allows for parallel computation, while a shorter maximum path length between tokens makes it easier for the network to learn long-range dependencies.
0
1
Updated 2026-05-14
Tags
D2L
Dive into Deep Learning @ D2L