1Cademy - Seq2SeqAttentionDecoder Forward Pass

Learn Before

Activity (Process)

Seq2SeqAttentionDecoder Forward Pass

During the forward pass of the Seq2SeqAttentionDecoder, target token indices are first embedded and transposed to shape (num_steps, batch_size, embed_size). The decoder then iterates over each time step. At each step: (1) the final-layer hidden state from the previous time step is unsqueezed to shape (batch_size, 1, num_hiddens) to act as the attention query; (2) the additive attention mechanism computes a context vector of shape (batch_size, 1, num_hiddens) by attending over all encoder outputs (keys and values), using valid lengths to mask padding; (3) the current embedded input is unsqueezed to shape (batch_size, 1, embed_size) and concatenated with the context vector along the feature dimension; (4) the concatenated tensor of shape (batch_size, 1, embed_size + num_hiddens) is transposed and fed into the GRU, which updates the decoder's hidden state. After processing all time steps, the GRU outputs are concatenated and projected through a dense layer to produce predictions of shape (batch_size, num_steps, vocab_size). Attention weights are stored for visualization.

0

1

Updated 2026-06-27

Contributors are:

Who are from: