1Cademy - Predicting Performance Improvement from Model Scaling

Learn Before

Power Law Fit for Test Loss vs. Model and Dataset Size

Short Answer

Predicting Performance Improvement from Model Scaling

A research lab's language models follow the power-law relationship $L(N) \propto N^{-0.076}$ , where L is the test loss and N is the number of parameters. If they increase the number of parameters in their next model by a factor of 100, what is the expected percentage decrease in the test loss? Show your calculation.

Updated 2025-10-10

Contributors are:

Who are from:

Learn Before

Related