Multiple Choice

A development team is building a dataset to improve a language model's ability to follow instructions. Their automated process is: 1) For each instruction, generate one response from a powerful language model. 2) Use another prompt to ask the same model to score the helpfulness of that single response on a scale of 1 to 5. The team observes that the model they are training with this data is not improving as expected. What is the most likely flaw in their data generation process?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science