Concept

Seq2SeqAttentionDecoder State Initialization

The decoder state in the Seq2SeqAttentionDecoder is initialized from the encoder outputs as a three-element tuple: (i) the encoder's last-layer hidden states at all time steps, transposed to shape (batch_size, num_steps, num_hiddens), which serve as both the keys and values for the attention mechanism; (ii) the encoder's hidden states across all layers at the final time step, with shape (num_layers, batch_size, num_hiddens), used to initialize the decoder's GRU hidden state; and (iii) the valid lengths of the encoder inputs, used to mask padding tokens during attention pooling.

0

1

Updated 2026-05-14

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L