1Cademy - A research team is using the following empirical formula to guide their training strategy for a large language model, where `L` is the test loss, `N` is the model size, and `D` is the dataset size: $$ \mathcal{L}(N,D) = \frac{406.4}{N^{0.34}} + \frac{410.7}{D^{0.28}} + 1.69 $$ To achieve the most substantial reduction in test loss, which of the following strategies is predicted by this formula to be more effective?

Learn Before

Chinchilla Scaling Law Formula

Multiple Choice

A research team is using the following empirical formula to guide their training strategy for a large language model, where L is the test loss, N is the model size, and D is the dataset size: $\mathcal{L}(N,D) = \frac{406.4}{N^{0.34}} + \frac{410.7}{D^{0.28}} + 1.69$ To achieve the most substantial reduction in test loss, which of the following strategies is predicted by this formula to be more effective?

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related