1Cademy - Chinchilla Scaling Law

Learn Before

Combined Power Law for LLM Loss with Model and Dataset Size

Theory

Chinchilla Scaling Law

The Chinchilla scaling law provides a framework for predicting language model performance. According to this principle, the test loss per token is derived by adding a constant irreducible error term to two separate inverse proportional functions: one based on the model size ( $N$ ) and the other based on the training dataset size ( $D$ ).