1Cademy - Objective for Distribution Matching in Fine-Tuning

Learn Before

Instruction Fine-Tuning

Formula

Objective for Distribution Matching in Fine-Tuning

The optimal parameters \hat{\theta} for a model are found by minimizing a loss function that quantifies the difference between the model's output distribution Pr_{s_\theta} and a target distribution Pr_t. This optimization is performed over a dataset D' and is formally expressed as: $\hat{\theta} = \arg \min_{\theta} \sum_{x' \in D'} \text{Loss}(\text{Pr}_t(\cdot|\cdot), \text{Pr}_{s_\theta}(\cdot|\cdot), x')$ This objective is common in techniques like knowledge distillation, where a student model (s) learns to mimic a teacher model (t).