Learn Before
Concept
Seq2SeqAttentionDecoder State Initialization
The decoder state in the Seq2SeqAttentionDecoder is initialized from the encoder outputs as a three-element tuple: (i) the encoder's last-layer hidden states at all time steps, transposed to shape (batch_size, num_steps, num_hiddens), which serve as both the keys and values for the attention mechanism; (ii) the encoder's hidden states across all layers at the final time step, with shape (num_layers, batch_size, num_hiddens), used to initialize the decoder's GRU hidden state; and (iii) the valid lengths of the encoder inputs, used to mask padding tokens during attention pooling.
0
1
Updated 2026-05-14
Tags
D2L
Dive into Deep Learning @ D2L