Chinchilla Scaling Law
The Chinchilla scaling law provides a framework for predicting language model performance. According to this principle, the test loss per token is derived by adding a constant irreducible error term to two separate inverse proportional functions: one based on the model size () and the other based on the training dataset size ().

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Chinchilla Scaling Law
A research team is working to improve a large language model and is using the combined power law,
L(N,D) = aN^b + cD^d + ε_∞, to guide their efforts. Their analysis shows that the termaN^b, which depends on the model's parameter count (N), is currently the largest contributor to the total loss. The termcD^d, which depends on the dataset size (D), is comparatively small. To achieve the most significant reduction in loss with their limited resources, what should the team prioritize?Diagnosing LLM Training Plateaus
Optimizing LLM Training Strategy
Learn After
Chinchilla Scaling Law Formula
A research team is training a large language model and observes that the model's performance, measured by test loss, seems to be primarily limited by the number of model parameters rather than the amount of training data. According to the principle that models test loss as a sum of two separate terms—one that decreases as model size grows and another that decreases as dataset size grows—which of the following actions would most effectively reduce the test loss in this specific situation?
Critique of a Model Training Strategy
Analyzing Performance Limits in Language Models