Short Answer

Comparing Single-Variable Scaling Functions

A research team has developed two separate mathematical functions to model their language model's performance. Function A describes the model's final loss solely as a function of the training dataset size (while holding model size constant). Function B describes the model's final loss solely as a function of the number of model parameters (while holding dataset size constant). Explain why relying on only one of these functions could lead to a suboptimal training strategy for a new, larger model.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science