1Cademy - Deconstructing the Weak-to-Strong Fine-Tuning Objective

Learn Before

Objective Function for Fine-Tuning a Strong LLM with Weak Supervision

Short Answer

Deconstructing the Weak-to-Strong Fine-Tuning Objective

A common method for improving a powerful model involves training it on data labeled by a less powerful, 'weak' model. The optimization goal for this process is captured by the following mathematical expression:

$\tilde{\theta} = \arg \max_{\theta} \sum_{\mathbf{x} \in X} \log \Pr_{\theta}^{s}(\hat{\mathbf{y}}|\mathbf{x})$

Explain the specific role and significance of the following three components within this objective function:

The term $\hat{\mathbf{y}}$
The expression $\Pr_{\theta}^{s}(\cdot|\mathbf{x})$
The operator $\arg \max_{\theta}$

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related