A development team trains a language model to generate helpful code snippets. To improve its performance, they also build a separate model that automatically assigns a numerical score from 1 to 10 to each generated snippet, with 10 being the most helpful. What is the most critical factor that determines whether this scoring model can reliably identify helpful code?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Improving a Chatbot's Politeness
A development team wants to improve a language model's ability to generate helpful and safe responses. They decide to use a system where a separate, trained model provides a quality score for each generated response. Arrange the following steps in the logical order required to implement and use this system.
A development team trains a language model to generate helpful code snippets. To improve its performance, they also build a separate model that automatically assigns a numerical score from 1 to 10 to each generated snippet, with 10 being the most helpful. What is the most critical factor that determines whether this scoring model can reliably identify helpful code?