1Cademy - A research team is training a large language model and observes that the models performance, measured by test loss, seems to be primarily limited by the number of model parameters rather than the amount of training data. According to the principle that models test loss as a sum of two separate terms—one that decreases as model size grows and another that decreases as dataset size grows—which of the following actions would most effectively reduce the test loss in this specific situation?

Learn Before

Chinchilla Scaling Law

Multiple Choice

A research team is training a large language model and observes that the model's performance, measured by test loss, seems to be primarily limited by the number of model parameters rather than the amount of training data. According to the principle that models test loss as a sum of two separate terms—one that decreases as model size grows and another that decreases as dataset size grows—which of the following actions would most effectively reduce the test loss in this specific situation?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related