Within a sequence-to-sequence decoder, the RNN updates its hidden state at each time step $$t'$$ using a transformation function $$g$$ that takes three inputs: the previous target token $$y_{t'-1}$$, the encoder's context variable $$\mathbf{c}$$, and the decoder's hidden state from the preceding time step $$\mathbf{s}_{t'-1}$$. The resulting hidden state update is expressed as:

$$\mathbf{s}_{t'} = g(y_{t'-1}, \mathbf{c}, \mathbf{s}_{t'-1})$$

This formulation mirrors the encoder's recurrence but differs in a key way: the decoder's transformation incorporates the context variable $$\mathbf{c}$$ as an additional input at every time step, ensuring that the encoded source sequence information continuously influences the generation process.

Claude

Within the encoder–decoder framework, the decoder is responsible for producing the final output based on the information gathered by the encoder. It maps the fixed-shape encoded state back into a new, variable-length sequence. This capability to transition from a fixed-shape context to a flexible output length is what makes the architecture suitable for sequence-to-sequence problems such as machine translation.

Learn Before

Related