1Cademy - A research team is working to improve a large language model and is using the combined power law, `L(N,D) = aN^b + cD^d + ε_∞`, to guide their efforts. Their analysis shows that the term `aN^b`, which depends on the models parameter count (N), is currently the largest contributor to the total loss. The term `cD^d`, which depends on the dataset size (D), is comparatively small. To achieve the most significant reduction in loss with their limited resources, what should the team prioritize?

Learn Before

Combined Power Law for LLM Loss with Model and Dataset Size

Multiple Choice

A research team is working to improve a large language model and is using the combined power law, L(N,D) = aN^b + cD^d + ε_∞, to guide their efforts. Their analysis shows that the term aN^b, which depends on the model's parameter count (N), is currently the largest contributor to the total loss. The term cD^d, which depends on the dataset size (D), is comparatively small. To achieve the most significant reduction in loss with their limited resources, what should the team prioritize?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related