For each neuron in the sequence, the output $\hat y^{<t>}$ and the hidden state $h^{<t>}$  at time step t can be calculated as follows.
$$h^{<t>}=g_1(Wh^{<t-1>} \oplus Ux^{<t>}+b_h)$$
$$\hat y^{<t>}=g_2(Vh^{<t>}+b_y)$$
, where $g_1, g_2$ are activation functions, $U,V,W$ are trainable parameter matrixes, $b_h$ and $b_y$ are bias terms, and $\oplus$ denotes that it needs to be summed up element-wise.

When the network is trained, it not only assigns weights U to each neuron’s inputs, but also discovers the weight parameters W of the hidden state function. These parameters define how much of the information from the previous steps should be carried forward to each subsequent step.

University of Michigan - Ann Arbor

Claude

The network looks at a series of inputs over time, X0, X1, X2, until Xt. For example, this could be a sequence of words in a sentence. The neural network has one layer of neurons for each input.

At each stage in the sequence, each layer of neurons generates two things:

 - An output—$\hat{y}^{(t)}$—this is the model’s prediction for what should be the next element in the sequence
 - A hidden state—$\hat{h}^{(t)}$—this is the network’s “short-term memory”. The hidden state is an activation function, which takes as its input both the hidden state of the previous step and the output of the current step. This allows the model to carry over information from previous steps to the current step.


Unrolled RNN Structure

Natural Language Processing with Sequence Models is the third course in the Natural Language Prcessing Specialization published by deeplearning.ai.

Learn Before

Related