Activity (Process)

Seq2SeqAttentionDecoder Forward Pass

During the forward pass of the Seq2SeqAttentionDecoder, the target token indices are first embedded and transposed to (num_steps, batch_size, embed_size). The decoder then iterates over each time step. At each step: (1) the last-layer hidden state from the previous time step is reshaped into a query of shape (batch_size, 11, num_hiddens); (2) the additive attention mechanism computes a context vector of shape (batch_size, 11, num_hiddens) by attending over all encoder outputs (keys and values), with padding excluded via valid lengths; (3) this context vector is concatenated with the current embedded input along the feature dimension; (4) the concatenated tensor, of size embed_size + num_hiddens, is fed through the GRU, which updates the hidden state. After processing all time steps, the collected GRU outputs are concatenated and projected through a dense layer to produce predictions of shape (batch_size, num_steps, vocab_size). The attention weights from every step are stored for later inspection.

0

1

Updated 2026-05-14

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L