Evaluating a Reward Function for Story Generation
Based on the provided scenario, what is the primary signal the model receives during training, and what is the likely impact on the model's ability to distinguish between high-quality and low-quality story segments? Explain your reasoning.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a machine learning system, a penalty is assigned to every generated output segment using the function
r(x, y, ȳ) = -1, wherexis the input,yis the generated output, andȳis the average of all possible outputs. If the system generates an output segment, what is the value of the penalty it receives according to this function?Evaluating a Reward Function for Story Generation