1Cademy - Comparison of Efficient LLM Architectures

Learn Before

Recurrent Models

Comparison

Comparison of Efficient LLM Architectures

A comparison of efficient Large Language Model (LLM) architectures highlights their varying approaches to handling sequence context. Key architectures include self-attention, sparse attention, linear attention, and recurrent models. These models differ primarily in how they maintain cached states for producing an output at a specific position $i$ . For example, recurrent models use a recurrent cell, denoted as $f(\cdot)$ , to sequentially update their internal state, distinguishing them from various attention-based models that cache past queries, keys, or values differently.

Updated 2026-04-22

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn Before

Related