Learn Before
Formula

Objective for Distribution Matching in Fine-Tuning

The optimal parameters \hat{\theta} for a model are found by minimizing a loss function that quantifies the difference between the model's output distribution Pr_{s_\theta} and a target distribution Pr_t. This optimization is performed over a dataset D' and is formally expressed as: θ^=argminθxDLoss(Prt(),Prsθ(),x)\hat{\theta} = \arg \min_{\theta} \sum_{x' \in D'} \text{Loss}(\text{Pr}_t(\cdot|\cdot), \text{Pr}_{s_\theta}(\cdot|\cdot), x') This objective is common in techniques like knowledge distillation, where a student model (s) learns to mimic a teacher model (t).

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related