To demonstrate the baseline behavior of a recurrent neural network (RNN) language model prior to optimization, we can invoke its `predict` method using randomly initialized weights. Providing a short prefix string such as 'it has' results in a repetitive and nonsensical sequence of characters, illustrating that the model lacks the linguistic representations necessary to generate coherent text:

```python
model.predict('it has', 20, data.vocab)
```

Output:
```text
'it hasoadd dd dd dd dd dd '
```

Example of Untrained Concise RNN Language Model Prediction

A complete RNN-based language model can be built concisely by inheriting from a foundational base class, such as RNNLMScratch, which manages the overarching training loop and sequence generation. In this high-level implementation, the primary requirement is defining a separate fully connected output layer (e.g., nn.LazyLinear in PyTorch or tf.keras.layers.Dense in TensorFlow) sized to the vocabulary. During the forward pass, this linear layer projects the intermediate hidden states generated by the high-level RNN module into unnormalized probability distributions (logits) across the vocabulary to predict subsequent tokens.

Claude

The RNNLMScratch class implements an RNN-based language model from scratch by composing a previously defined RNN module with an output projection layer. It extends a Classifier base class and accepts an RNN instance, the vocabulary size, and a learning rate as constructor arguments. Because a language model's inputs and outputs are drawn from the same vocabulary, both share the same dimensionality, which equals the vocabulary size. The output layer is defined by a learnable weight matrix $$\mathbf{W}_{hq} \in \mathbb{R}^{h 	imes q}$$ (where $$h$$ is the number of hidden units and $$q$$ is the vocabulary size), initialized from a scaled normal distribution, and a bias vector $$\mathbf{b}_q \in \mathbb{R}^{q}$$, initialized to zeros. These parameters project each hidden state to a vector of logits over the vocabulary.

RNNLMScratch Class

Dive into Deep Learning

In a recurrent neural network (RNN) language model, a fully connected output layer is utilized to transform the sequence of RNN hidden states into token predictions at each time step. This layer projects the hidden representations into the vocabulary space, generating unnormalized scores (logits) that indicate the likelihood of each token in the vocabulary being the next item in the sequence.

Output Layer Transformation in RNN Language Models

To train a character-level language model from scratch, an `RNNLMScratch` instance is initialized with a recurrent module (such as `RNNScratch`) and the dataset's vocabulary size. The model is then trained on a sequential dataset (like The Time Machine corpus) using a training utility class. During this execution phase, the trainer must be configured with a gradient clipping value (e.g., `gradient_clip_val=1`) to ensure gradients are clipped before parameter updates.

```python
data = d2l.TimeMachine(batch_size=1024, num_steps=32)
rnn = RNNScratch(num_inputs=len(data.vocab), num_hiddens=32)
model = RNNLMScratch(rnn, vocab_size=len(data.vocab), lr=1)
trainer = d2l.Trainer(max_epochs=100, gradient_clip_val=1, num_gpus=1)
trainer.fit(model, data)
```

Training Execution for RNNLMScratch

The predict method in the RNNLMScratch class generates a continuation of characters from a user-provided prefix. It initializes the hidden state and processes the prefix during a warm-up period to build context. After the warm-up, it iteratively predicts a specified number of subsequent characters. At each step of the generation phase, the newly predicted character is fed back into the RNN as the input for the next time step, and the output layer maps the hidden state to a vocabulary index.

Learn Before

Related

Learn After