1Cademy - Seq2Seq Training Execution for Machine Translation

Learn Before

Seq2Seq Model Implementation

Code

Seq2Seq Training Execution for Machine Translation

To train an RNN encoder–decoder model for machine translation, the complete pipeline is assembled by instantiating the dataset, encoder, decoder, and model with specific hyperparameters, then fitting the model using a training utility class. A typical configuration uses the MTFraEng dataset with batch_size=128, and both the encoder and decoder are constructed with an embedding size of 256, 256 hidden units, 2 GRU layers, and a dropout rate of 0.2. The encoder receives the source vocabulary size, and the decoder receives the target vocabulary size, along with these shared architectural hyperparameters. The Seq2Seq model wraps the encoder and decoder, taking the target-language padding token index and a learning rate of 0.005. Training is executed for 30 epochs with gradient clipping set to 1 to stabilize parameter updates in the recurrent network, and the model is trained on a single GPU.

data = d2l.MTFraEng(batch_size=128)
embed_size, num_hiddens, num_layers, dropout = 256, 256, 2, 0.2
encoder = Seq2SeqEncoder(
    len(data.src_vocab), embed_size, num_hiddens, num_layers, dropout)
decoder = Seq2SeqDecoder(
    len(data.tgt_vocab), embed_size, num_hiddens, num_layers, dropout)
model = Seq2Seq(encoder, decoder, tgt_pad=data.tgt_vocab['<pad>'],
                lr=0.005)
trainer = d2l.Trainer(max_epochs=30, gradient_clip_val=1, num_gpus=1)
trainer.fit(model, data)

Updated 2026-05-14

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related