Comparison

Model Size Comparison of LLMs

The size of a Large Language Model (LLM) is determined by its parameters, which scale according to its depth (denoted as LL), width (denoted as dd), and the number of attention heads (for queries, keys, and values). For instance, early models like GPT-1 had 0.117 billion parameters with a depth of 12 and width of 768. In contrast, modern models have scaled massively: the LLaMA series scales up to 405 billion parameters, while DeepSeek-V3 reaches 671 billion parameters. Other prominent model families that demonstrate this variation in structural setup include Gemma2, Qwen2.5, Falcon, and Mistral.

0

1

Updated 2026-04-19

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences