Model Size Comparison of LLMs
The size of a Large Language Model (LLM) is determined by its parameters, which scale according to its depth (denoted as ), width (denoted as ), and the number of attention heads (for queries, keys, and values). For instance, early models like GPT-1 had 0.117 billion parameters with a depth of 12 and width of 768. In contrast, modern models have scaled massively: the LLaMA series scales up to 405 billion parameters, while DeepSeek-V3 reaches 671 billion parameters. Other prominent model families that demonstrate this variation in structural setup include Gemma2, Qwen2.5, Falcon, and Mistral.
0
1
Tags
Foundations of Large Language Models
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences