Learn Before
Optimizing LLM Training Budget
A machine learning team has developed two simplified mathematical functions to model their language model's expected performance (loss) based on either the number of model parameters (N) or the size of the training dataset (D). They have a fixed budget that allows them to either double the parameters or double the dataset size, but not both. Given the functions below, which single action should they take to achieve the greatest reduction in model loss? Justify your choice by showing which option results in a lower final loss.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Absence of a Universal Scaling Law
A research team is developing a new language model. They train several versions of the model, each with a different number of parameters, while keeping the training dataset size fixed. They plot the final training loss for each model version against its parameter count. The resulting graph shows a consistent, downward-curving trend: as the number of parameters increases, the loss decreases, but the amount of improvement gets smaller with each increase. Based on this observation, what is the most accurate conclusion the team can draw?
Optimizing LLM Training Budget
Comparing Single-Variable Scaling Functions