Learn Before
Comparison of Efficient LLM Architectures
A comparison of efficient Large Language Model (LLM) architectures highlights their varying approaches to handling sequence context. Key architectures include self-attention, sparse attention, linear attention, and recurrent models. These models differ primarily in how they maintain cached states for producing an output at a specific position . For example, recurrent models use a recurrent cell, denoted as , to sequentially update their internal state, distinguishing them from various attention-based models that cache past queries, keys, or values differently.
0
1
Tags
Foundations of Large Language Models
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Recurrent Memory Models as a Basis for Self-Attention Alternatives
Recursive Formula for Memory as a Cumulative Average
A recurrent model with an internal state
his processing a sequence of inputs. The state is updated at each step according to the ruleh_i = f(h_{i-1}, input_i), whereh_{i-1}is the state from the previous step andinput_iis the current input. When the model processes the third input in a sequence, what information does the termh_2(the state after the second input) represent in the computation for the new stateh_3?Analysis of Sequential Information Processing
A neural network processes a sequence of inputs by updating a hidden state
hat each stepiusing the formula:h_i = f(h_{i-1}, input_i). Which component in this formula is directly responsible for carrying forward a compressed summary of the entire sequence processed up to the previous step (i-1)?Recurrent Computation of and in Linear Attention
Real-Time Applications of Recurrent Models
Resurgence of Recurrent Models in Large Language Models
Sequential Token Processing in Recurrent Models
Comparison of Efficient LLM Architectures