Example
Example of Training a Deep GRU Model
As an example, a deep Gated Recurrent Unit (GRU) language model can be trained by specifying a nontrivial number of hidden layers, such as setting the num_layers parameter to 2. The architectural decisions and hyperparameters closely mirror those of single-layer networks: setting the number of inputs and outputs equal to the number of distinct tokens (vocab_size), and using a standard number of hidden units (e.g., 32). The primary structural difference is the explicit selection of multiple hidden layers.
0
1
Updated 2026-06-17
Tags
D2L
Dive into Deep Learning @ D2L