1Cademy - A development team is building a dataset to improve a language models ability to follow instructions. Their automated process is: 1) For each instruction, generate one response from a powerful language model. 2) Use another prompt to ask the same model to score the helpfulness of that single response on a scale of 1 to 5. The team observes that the model they are training with this data is not improving as expected. What is the most likely flaw in their data generation process?

Learn Before

Generating Preference Data Using LLMs

Multiple Choice

A development team is building a dataset to improve a language model's ability to follow instructions. Their automated process is: 1) For each instruction, generate one response from a powerful language model. 2) Use another prompt to ask the same model to score the helpfulness of that single response on a scale of 1 to 5. The team observes that the model they are training with this data is not improving as expected. What is the most likely flaw in their data generation process?

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related