1Cademy - Recurrent Models

Learn Before

Markov Process

Theory

Recurrent Models

Recurrent models are a class of neural networks designed to process sequential data. They operate by maintaining a hidden state, denoted as 'h', which captures information from previous elements in the sequence. At each time step 'i', the model updates its state to 'h_i' by applying a function 'f' to the previous state 'h_{i-1}' and the current input 'input_i'. This relationship is formally expressed by the recurrence relation: $h_i = f(h_{i-1}, \text{input}_i)$ This mechanism allows the model to have a 'memory' of past events, making it suitable for tasks involving sequences like natural language processing.

Updated 2026-04-22

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Recurrent Memory Models as a Basis for Self-Attention Alternatives
Recursive Formula for Memory as a Cumulative Average
A recurrent model with an internal state h is processing a sequence of inputs. The state is updated at each step according to the rule h_i = f(h_{i-1}, input_i), where h_{i-1} is the state from the previous step and input_i is the current input. When the model processes the third input in a sequence, what information does the term h_2 (the state after the second input) represent in the computation for the new state h_3?
Analysis of Sequential Information Processing
A neural network processes a sequence of inputs by updating a hidden state h at each step i using the formula: h_i = f(h_{i-1}, input_i). Which component in this formula is directly responsible for carrying forward a compressed summary of the entire sequence processed up to the previous step (i-1)?
Recurrent Computation of $\mu_i$ and $\nu_i$ in Linear Attention
Real-Time Applications of Recurrent Models
Resurgence of Recurrent Models in Large Language Models
Sequential Token Processing in Recurrent Models
Comparison of Efficient LLM Architectures

Learn Before

Related

Learn After