Within a sequence-to-sequence encoder, an RNN processes the input sequence one token at a time. At each time step $$t$$, the recurrent layer applies a transformation function $$f$$ that combines the input feature vector $$\mathbf{x}_t$$ (derived from the $$t^{\textrm{th}}$$ token $$x_t$$) with the hidden state $$\mathbf{h}_{t-1}$$ carried from the preceding time step to produce the updated hidden state $$\mathbf{h}_t$$:

$$\mathbf{h}_t = f(\mathbf{x}_t, \mathbf{h}_{t-1})$$

This recurrence captures how the encoder incrementally builds a representation of the input sequence, with each hidden state $$\mathbf{h}_t$$ encoding information about all tokens observed up to and including position $$t$$.

Claude

In an encoder–decoder architecture, the encoder is designed to process input data that may vary in size. It accepts a variable-length sequence as its input and mathematically transforms it into an encoded state that has a fixed, predetermined shape. This fixed-shape state acts as a compressed summary containing the crucial context from the original sequence. In a common approach, the encoder uses an RNN to compute hidden states $$\mathbf{h}_1, \ldots, \mathbf{h}_T$$ for all time steps and then derives the context variable $$\mathbf{c}$$ through a customized function $$q$$: $$\mathbf{c} = q(\mathbf{h}_1, \ldots, \mathbf{h}_T)$$. A simple choice for $$q$$ is to set $$\mathbf{c} = \mathbf{h}_T$$, using the final hidden state as the complete context.

Encoder Transformation to a Fixed-Shape State

Dive into Deep Learning

RNN Encoder Hidden State Recurrence

The `Seq2SeqEncoder` class implements the RNN-based encoder for sequence-to-sequence learning by extending a base `Encoder` interface. Its architecture consists of two primary components: an **embedding layer** that converts each input token index into a dense feature vector, and a **multilayer GRU** that processes the resulting sequence of embeddings. The embedding layer's weight matrix has a shape of (`vocab_size`, `embed_size`), where each row $$i$$ stores the feature vector for the token with index $$i$$. During the forward pass, the input tensor of shape (batch_size, num_steps) is first transposed and embedded to produce a tensor of shape (num_steps, batch_size, embed_size). The GRU then processes this sequence and returns two outputs: `outputs` of shape (num_steps, batch_size, num_hiddens), containing the final-layer hidden states at every time step, and `state` of shape (num_layers, batch_size, num_hiddens), containing the hidden states of all layers at the final time step. All weights are initialized using Xavier initialization.

Seq2SeqEncoder Implementation

In the encoder component of a sequence-to-sequence model, the choice between a unidirectional and a bidirectional RNN determines how much input context each hidden state can capture. A **unidirectional RNN encoder** computes hidden states that depend solely on the input subsequence at and before the current time step, meaning each $$\mathbf{h}_t$$ encodes only the tokens $$x_1, \ldots, x_t$$. In contrast, a **bidirectional RNN encoder** produces hidden states that also incorporate information from tokens appearing after the current position, effectively encoding the information of the entire input sequence at every time step. While a bidirectional encoder provides richer context, the unidirectional design is simpler and sufficient for many seq2seq applications.

Learn Before

Related