Learn Before
Concept

LSTM Initial States Initialization

When implementing a Long Short-Term Memory (LSTM) network, processing a new sequence without a prior state requires the explicit initialization of two distinct state variables. Both the hidden state (HH) and the memory cell internal state (CC) must be initialized, typically as zero tensors with a shape dictated by the batch size and the number of hidden units. This dual-state initialization provides a neutral starting point for the gating mechanisms and the input node before the first sequence token is processed.

0

1

Updated 2026-05-14

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L