Learn Before
Sequential Token Processing in Recurrent Models
In long sequence modeling, recurrent models operate by reading one or a few tokens at a time. They use these inputs to update their internal recurrent state and then discard the inputs before the next tokens arrive. At any given step, the model generates its output based solely on the current recurrent state, rather than referring back to all previous states or past inputs.
0
1
Tags
Foundations of Large Language Models
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Recurrent Memory Models as a Basis for Self-Attention Alternatives
Recursive Formula for Memory as a Cumulative Average
A recurrent model with an internal state
his processing a sequence of inputs. The state is updated at each step according to the ruleh_i = f(h_{i-1}, input_i), whereh_{i-1}is the state from the previous step andinput_iis the current input. When the model processes the third input in a sequence, what information does the termh_2(the state after the second input) represent in the computation for the new stateh_3?Analysis of Sequential Information Processing
A neural network processes a sequence of inputs by updating a hidden state
hat each stepiusing the formula:h_i = f(h_{i-1}, input_i). Which component in this formula is directly responsible for carrying forward a compressed summary of the entire sequence processed up to the previous step (i-1)?Recurrent Computation of and in Linear Attention
Real-Time Applications of Recurrent Models
Resurgence of Recurrent Models in Large Language Models
Sequential Token Processing in Recurrent Models
Comparison of Efficient LLM Architectures