Learn Before
Concept
LSTM Initial States Initialization
When implementing a Long Short-Term Memory (LSTM) network, processing a new sequence without a prior state requires the explicit initialization of two distinct state variables. Both the hidden state () and the memory cell internal state () must be initialized, typically as zero tensors with a shape dictated by the batch size and the number of hidden units. This dual-state initialization provides a neutral starting point for the gating mechanisms and the input node before the first sequence token is processed.
0
1
Updated 2026-05-14
Tags
D2L
Dive into Deep Learning @ D2L