1Cademy - A research team is deciding between two language model sizes. Model A will have 10 billion parameters, and Model B will have 100 billion parameters. According to the empirical relationship where performance loss (L) is a function of the number of parameters (N), as shown in the formula below, which model should the team choose to achieve a lower final loss, and what is the justification? <br><br>$$L(N) = \left(\frac{N}{8.8 \times 10^{13}}\right)^{-0.076}$$

Learn Before

Empirical Power Law for LLM Loss vs. Model Size (N)

Multiple Choice

A research team is deciding between two language model sizes. Model A will have 10 billion parameters, and Model B will have 100 billion parameters. According to the empirical relationship where performance loss (L) is a function of the number of parameters (N), as shown in the formula below, which model should the team choose to achieve a lower final loss, and what is the justification?

$L(N) = \left(\frac{N}{8.8 \times 10^{13}}\right)^{-0.076}$

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related