1Cademy - A language model is being fine-tuned on a dataset of customer support chat logs to improve its ability to generate helpful responses. The training process is guided by the objective function: $$ \tilde{\theta} = \arg \max_{\hat{\theta}^{+}} \sum_{(\text{query},\text{response})\in\mathcal{D}} \log \mathrm{Pr}_{\hat{\theta}^{+}}(\text{response}|\text{query}) $$ During one step of this process, the model processes a single `(query, response)` pair from the dataset. What is the role of the specific

Learn Before

Mathematical Formulation of the Supervised Fine-Tuning Objective

Multiple Choice

A language model is being fine-tuned on a dataset of customer support chat logs to improve its ability to generate helpful responses. The training process is guided by the objective function: $\tilde{\theta} = \arg \max_{\hat{\theta}^{+}} \sum_{(\text{query},\text{response})\in\mathcal{D}} \log \mathrm{Pr}_{\hat{\theta}^{+}}(\text{response}|\text{query})$ During one step of this process, the model processes a single (query, response) pair from the dataset. What is the role of the specific

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related