Learn Before
Chinchilla Scaling Law Formula
Hoffmann et al. (2022) established a precise empirical equation for the Chinchilla scaling law to compute the test loss () based on the model size () and the dataset size (). The formulation is expressed as:
This relationship divides the overall loss into three distinct components: a model scaling term, a dataset scaling term, and a baseline irreducible error of .

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Chinchilla Scaling Law Formula
A research team is training a large language model and observes that the model's performance, measured by test loss, seems to be primarily limited by the number of model parameters rather than the amount of training data. According to the principle that models test loss as a sum of two separate terms—one that decreases as model size grows and another that decreases as dataset size grows—which of the following actions would most effectively reduce the test loss in this specific situation?
Critique of a Model Training Strategy
Analyzing Performance Limits in Language Models
Learn After
Optimizing Training Resources
Theoretical Loss Limit with Infinite Data
A research team is using the following empirical formula to guide their training strategy for a large language model, where
Lis the test loss,Nis the model size, andDis the dataset size: To achieve the most substantial reduction in test loss, which of the following strategies is predicted by this formula to be more effective?