Learn Before
Concept

GRU Parameters Initialization

In a Gated Recurrent Unit (GRU) network, the learnable model parameters encompass weight matrices and bias vectors for the update gate, the reset gate, and the candidate hidden state. The dimensionality of these parameters is dictated by the input size and the hyperparameter defining the number of hidden units. A standard initialization strategy involves drawing all weight values from a Gaussian distribution with a specified standard deviation, while initializing all bias values exactly to 00.

0

1

Updated 2026-05-14

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L