Learn Before
Comparison

Comparison of Efficient LLM Architectures

A comparison of efficient Large Language Model (LLM) architectures highlights their varying approaches to handling sequence context. Key architectures include self-attention, sparse attention, linear attention, and recurrent models. These models differ primarily in how they maintain cached states for producing an output at a specific position ii. For example, recurrent models use a recurrent cell, denoted as f()f(\cdot), to sequentially update their internal state, distinguishing them from various attention-based models that cache past queries, keys, or values differently.

Image 0

0

1

Updated 2026-04-22

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences