1Cademy - Evaluating an Alternative Fine-Tuning Objective

Learn Before

Mathematical Formulation of the Supervised Fine-Tuning Objective

Case Study

Evaluating an Alternative Fine-Tuning Objective

Based on the provided scenario, critique Objective B and explain the mathematical and practical reasons why Objective A is the preferred standard for supervised fine-tuning.

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Notational Simplification in Fine-Tuning Formulas
A language model is being fine-tuned on a dataset of customer support chat logs to improve its ability to generate helpful responses. The training process is guided by the objective function: $\tilde{\theta} = \arg \max_{\hat{\theta}^{+}} \sum_{(\text{query},\text{response})\in\mathcal{D}} \log \mathrm{Pr}_{\hat{\theta}^{+}}(\text{response}|\text{query})$ During one step of this process, the model processes a single (query, response) pair from the dataset. What is the role of the specific component log Pr(response|query) for this single pair?
The following equation represents the primary goal of a common model training process. Match each mathematical symbol from the equation to its correct description.

$\tilde{\theta} = \arg \max_{\hat{\theta}^{+}} \sum_{(\mathbf{x},\mathbf{y})\in\mathcal{D}} \log \mathrm{Pr}_{\hat{\theta}^{+}}(\mathbf{y}|\mathbf{x})$
Evaluating an Alternative Fine-Tuning Objective
Token-Level Conditional Log-Probability in Supervised Fine-Tuning

Learn Before

Related