Learn Before
Comparison

Comparing CNN, RNN, and Self-Attention Architectures

When evaluating architectures such as CNNs, RNNs, and self-attention for mapping an input sequence of nn tokens to an output sequence of the same length (with each token represented as a dd-dimensional vector), three main properties are compared: computational complexity, sequential operations, and maximum path lengths. A smaller number of sequential operations is desirable as it allows for parallel computation, while a shorter maximum path length between tokens makes it easier for the network to learn long-range dependencies.

Image 0

0

1

Updated 2026-05-14

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L