1Cademy - A team is fine-tuning a pre-trained language model using a dataset of high-quality instruction-response pairs. The training process aims to adjust the models parameters to maximize the probability of it generating the exact target response for each given instruction. After training, the team observes that the model often produces responses that are factually correct but much shorter and less detailed than the high-quality examples in their dataset. What is the most likely reason for this behavior, given the training objective?

Learn Before

Supervised Fine-Tuning (SFT)

Multiple Choice

A team is fine-tuning a pre-trained language model using a dataset of high-quality instruction-response pairs. The training process aims to adjust the model's parameters to maximize the probability of it generating the exact target response for each given instruction. After training, the team observes that the model often produces responses that are factually correct but much shorter and less detailed than the high-quality examples in their dataset. What is the most likely reason for this behavior, given the training objective?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related