Concept
LSTM Parameters Initialization
In a Long Short-Term Memory (LSTM) network, the learnable model parameters include weight matrices and bias vectors for the three gates (input, forget, and output), as well as the input node. The dimensions of these parameters depend on the input size and the chosen number of hidden units. A standard initialization strategy involves drawing all weight values from a Gaussian distribution with a small standard deviation (e.g., ), and initializing all bias values exactly to .
0
1
Updated 2026-05-17
Tags
D2L
Dive into Deep Learning @ D2L