1Cademy - Parameter Selection via Loss Minimization

Learn Before

Training Objective as Loss Minimization over a Dataset

Case Study

Parameter Selection via Loss Minimization

Based on the training objective of minimizing the total error over the entire dataset, which set of parameters (Model A or Model B) would the training process select? Justify your answer by calculating the total error for each model.

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

A language model is trained on a dataset $D$ by finding the parameters $\hat{\theta}$ that optimize the following objective: $\hat{\theta} = \arg \min_{\theta} \sum_{\mathbf{x} \in D} \text{Loss}_{\theta}(\mathbf{x})$ Which statement best analyzes the relationship between this optimization objective and the principle of Maximum Likelihood Estimation (MLE)?
Parameter Selection via Loss Minimization
Deconstructing the Training Objective

Learn Before

Related